Biomarkers and methods for diagnosis of early stage pancreatic ductal adenocarcinoma

ABSTRACT

The present invention provides biomarker compositions and methods for the diagnosis and prognosis of PDAC. In a particular embodiment, the invention provides methods and compositions for screening, diagnosis and prognosis of early stage, asymptomatic PDAC.

This application claims the benefit of priority of U.S. provisionalapplication Ser. No. 61/990,029, filed May 7, 2014, the entire contentsof which are incorporated herein by reference.

The invention relates generally to the field of personalized medicineand, more specifically to compositions and methods for diagnosis ofearly stage pancreatic ductal adenocarcinoma (PDAC).

BACKGROUND

Pancreatic ductal adenocarcinoma (PDAC) is an aggressive tumor with poorsurvival rates in part due to the fact that most cases are initiallydiagnosed too late in the course of the disease for potentially curativeresection. With the highest mortality rate of any cancer, PDAC has a5-year survival rate less than 5%. Despite extensive efforts in recentyears, advancement in treatments has been meager. Surgical resectionremains the only curative intervention, but only 18% of patients arediagnosed at early stages when surgical resection is an option. Theremaining 82% of patients present with advanced disease at diagnosis.For these patients, disease management is limited to palliation. Asubstantial determinant for the lethality of PDAC is the latepresentation due to asymptomatic development and earlier detection wouldimprove outcomes by identifying the disease while still amenable topotentially curative intervention.

Screening programs designed to detect early stage PDAC in asymptomaticpopulations face considerable challenges, not the least of which is theneed for a highly accurate test that would limit false-positiveidentifications in this relatively rare disease. On the basis of thedifferential accumulation of mutations in primary and metastaticlesions, a recent study estimated an average of 11.7 years elapsed fromtumor initiation to overt cancer development and an average of 6.8 yearselapsed between the development of overt cancer and the development ofmetastatic disease. The finding that pancreatic tumors are present for asignificant period of time before clinical manifestation emphasizes thepotential for screening and early detection. Unfortunately, even thebest currently available blood-based biomarker, CA 19-9, has significantshortcomings, including unacceptably low accuracy, that limit its use tomonitoring disease progression. No single biomarker or combinations of afew biomarkers has emerged that markedly improves on CA 19-9 diagnosticaccuracy.

There is a great need to identify biomarkers that can be combined intoan accurate and cost-effective biomarker panel that would be an usefulscreening tool in asymptomatic populations. The present inventionaddresses this need by providing compositions and methods for thediagnosis of early stage PDAC. Related methods and advantages areprovided as well.

SUMMARY

The present invention provides compositions and methods for thediagnosis and prognosis of PDAC. In a particular embodiment, theinvention provides methods and compositions for screening, diagnosis andprognosis of early stage, asymptomatic PDAC.

In one aspect, the invention provides a panel of isolated biomarkerscomprising N of the biomarkers listed in Tables 1 through 4. In anotheraspect, the invention provides a panel of isolated biomarkers comprisingN of the biomarkers listed in Table 1, 2, 3 or 4. In one aspect, theinvention provides a panel of isolated biomarkers comprising N of thebiomarkers listed two or more of Tables 1 through 4. In someembodiments, N is a number selected from the group consisting of 2 to30.

In some embodiments, the biomarker panel comprises at least two of theisolated biomarkers selected from the group consisting of activatedleukocyte cell adhesion molecule (ALCAM), angiogenin (ANG), AXL receptortyrosine kinase (AXL), BCL2-associated athanogene 3 (BAG3), basigin(BSG, EMMPRIN, CD147), cancer antigen 19-9 (CA 19-9), carcinoembryonicantigen-related cell adhesion molecule 1 (biliary glycoprotein)(CEACAM1), collagen, type XVIII, alpha 1 (endostatin) (COL18A1),epithelial cell adhesion molecule (EPCAM), soluble hyaluronic acid (HA),haptoglobin (HP), intercellular adhesion molecule 1 (ICAM1),insulin-like growth factor binding protein 2 (IGFBP2), insulin-likegrowth factor binding protein 4 (IGFBP4), lipocalin 2 (LCN2, NGAL),leucine-rich alpha-2-glycoprotein 1 (LRG1), matrix metallopeptidase 2(MMP2, gelatinase A, 72 kDa gelatinase, 72 kDa type IV collagenase),matrix metallopeptidase 7 (MMP7, matrilysin, uterine), matrixmetallopeptidase 9 (MMP9, gelatinase B, 92 kDa gelatinase, 92 kDa typeIV collagenase), mesothelin (MSLN), DJ-1 protein (PARK7), platelet basicprotein (PPBP), proteoglycan 4 (PRG4), SPARC-like 1 (SPARCL1, hevin),secreted phosphoprotein 1 (SPP1, osteopontin, OPN), transforming growthfactor, beta-induced, 68 kDa (TGFBI), thrombospondin 1 (THBS1), TIMPmetallopeptidase inhibitor 1 (TIMP1), tumor necrosis factor receptorsuperfamily, member 1A (TNFRSF1A), vascular endothelial growth factor C(VEGFC).

In additional embodiments, the biomarker panel comprises at least two ofthe isolated biomarkers selected from the group consisting of activatedleukocyte cell adhesion molecule (ALCAM), angiogenin (ANG), AXL receptortyrosine kinase (AXL), alpha-2-glycoprotein 1, zinc-binding (AZGP1),BCL2-associated athanogene 3 (BAG3), basigin (BSG) (EMMPRIN, CD147),cancer antigen 19-9 (CA 19-9), carcinoembryonic antigen-related celladhesion molecule 5 (CEACAM5) (CEA), carcinoembryonic antigen-relatedcell adhesion molecule 1 (CEACAM1, biliary glycoprotein), collagen, typeXVIII, alpha 1 (endostatin) (COL18A1), epithelial cell adhesion molecule(EPCAM), gelsolin (GSN), soluble hyaluronic acid (HA), haptoglobin (HP),intercellular adhesion molecule 1 (ICAM1), insulin-like growth factorbinding protein 2 (IGFBP2), insulin-like growth factor binding protein 4(IGFBP4), lipocalin 2 (LCN2) (NGAL), LIM and senescent cell antigen-likedomains 1 (LIMS1) (PINCH), leucine-rich alpha-2-glycoprotein 1 (LRG1),lactoferrin (LTF), matrix metallopeptidase 11 (MMP11) (stromelysin 3),matrix metallopeptidase 2 (MMP2) (gelatinase A, 72 kDa gelatinase, 72kDa type IV collagenase), matrix metallopeptidase 7 (MMP7) (matrilysin,uterine), matrix metallopeptidase 9 (MMP9) (gelatinase B, 92 kDagelatinase, 92 kDa type IV collagenase), mesothelin (MSLN), DJ-1 protein(PARK7), platelet factor 4 (PF4), plectin (PLEC), platelet basic protein(PPBP), proteoglycan 4 (PRG4), serum amyloid A (SAA), SPARC-like 1(SPARCL1)(hevin) secreted phosphoprotein 1 (SPP1) (osteopontin,OPN),transforming growth factor, beta-induced, 68 kDa (TGFBI), thrombospondin1 (THBS1), TIMP metallopeptidase inhibitor 1 (TIMP1), tumor necrosisfactor receptor superfamily, member 1A (TNFRSF1A), and vascularendothelial growth factor C (VEGFC).

In some embodiments, the biomarker panel comprises one or more peptidescomprising a fragment of a biomarker selected from activated leukocytecell adhesion molecule (ALCAM), angiogenin (ANG), AXL receptor tyrosinekinase (AXL), BCL2-associated athanogene 3 (BAG3), basigin (BSG,EMMPRIN, CD147), cancer antigen 19-9 (CA 19-9), carcinoembryonicantigen-related cell adhesion molecule 1 (biliary glycoprotein)(CEACAM1), collagen, type XVIII, alpha 1 (endostatin) (COL18A1),epithelial cell adhesion molecule (EPCAM), soluble hyaluronic acid (HA),haptoglobin (HP), intercellular adhesion molecule 1 (ICAM1),insulin-like growth factor binding protein 2 (IGFBP2), insulin-likegrowth factor binding protein 4 (IGFBP4), lipocalin 2 (LCN2, NGAL),leucine-rich alpha-2-glycoprotein 1 (LRG1), matrix metallopeptidase 2(MMP2, gelatinase A, 72 kDa gelatinase, 72 kDa type IV collagenase),matrix metallopeptidase 7 (MMP7, matrilysin, uterine), matrixmetallopeptidase 9 (MMP9, gelatinase B, 92 kDa gelatinase, 92 kDa typeIV collagenase), mesothelin (MSLN), DJ-1 protein (PARK7), platelet basicprotein (PPBP), proteoglycan 4 (PRG4), SPARC-like 1 (SPARCL1, hevin),secreted phosphoprotein 1 (SPP1, osteopontin, OPN), transforming growthfactor, beta-induced, 68 kDa (TGFBI), thrombospondin 1 (THBS1), TIMPmetallopeptidase inhibitor 1 (TIMP1), tumor necrosis factor receptorsuperfamily, member 1A (TNFRSF1A), vascular endothelial growth factor C(VEGFC).

In additional embodiments, the biomarker panel comprises one or morepeptides comprising a fragment of a biomarker selected from activatedleukocyte cell adhesion molecule (ALCAM), angiogenin (ANG), AXL receptortyrosine kinase (AXL), alpha-2-glycoprotein 1, zinc-binding (AZGP1),BCL2-associated athanogene 3 (BAG3), basigin (BSG) (EMMPRIN, CD147),cancer antigen 19-9 (CA 19-9), carcinoembryonic antigen-related celladhesion molecule 5 (CEACAM5) (CEA), carcinoembryonic antigen-relatedcell adhesion molecule 1 (CEACAM1, biliary glycoprotein), collagen, typeXVIII, alpha 1 (endostatin) (COL18A1), epithelial cell adhesion molecule(EPCAM), gelsolin (GSN), soluble hyaluronic acid (HA), haptoglobin (HP),intercellular adhesion molecule 1 (ICAM1), insulin-like growth factorbinding protein 2 (IGFBP2), insulin-like growth factor binding protein 4(IGFBP4), lipocalin 2 (LCN2) (NGAL), LIM and senescent cell antigen-likedomains 1 (LIMS1) (PINCH), leucine-rich alpha-2-glycoprotein 1 (LRG1),lactoferrin (LTF), matrix metallopeptidase 11 (MMP11) (stromelysin 3),matrix metallopeptidase 2 (MMP2) (gelatinase A, 72 kDa gelatinase, 72kDa type IV collagenase), matrix metallopeptidase 7 (MMP7) (matrilysin,utcrinc), matrix metallopeptidase 9 (MMP9) (gelatinase B, 92 kDagelatinase, 92 kDa type IV collagenase), mesothelin (MSLN), DJ-1 protein(PARK7), platelet factor 4 (PF4), plectin (PLEC), platelet basic protein(PPBP), proteoglycan 4 (PRG4), serum amyloid A (SAA), SPARC-like 1(SPARCL1)(hevin) secreted phosphoprotein 1 (SPP1) (osteopontin,OPN),transforming growth factor, beta-induced, 68 kDa (TGFBI), thrombospondin1 (THBS1), TIMP metallopeptidase inhibitor 1 (TIMP1), tumor necrosisfactor receptor superfamily, member 1A (TNFRSF1A), and vascularendothelial growth factor C (VEGFC).

In some embodiments, the panel of isolated biomarkcrs comprises at leasttwo of the isolated biomarkers selected from the group consisting ofBCL2-associated athanogene 3 (BAG3), cancer antigen 19-9 (CA 19-9),carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1),soluble hyaluronic acid (HA), insulin-like growth factor binding protein2 (IGFBP2), DJ-1 protein (PARK7), and secreted phosphoprotein 1 (SPP1).In further embodiments, the panel of isolated biomarkers can be used inmethods to distinguish healthy individuals from early stage PDAC.

In additional embodiments, the panel of isolated biomarkers comprises atleast two of the isolated biomarkers selected from the group consistingof BCL2-associated athanogene 3 (BAG3), cancer antigen 19-9 (CA 19-9),carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1),epithelial cell adhesion molecule (EPCAM), lipocalin 2 (LCN2),mesothelin (MSLN), DJ-1 protein (PARK7), proteoglycan 4 (PRG4), secretedphosphoprotein 1 (SPP1), and tumor necrosis factor receptor superfamily,member 1A (TNFRSF1A). In further embodiments, the panel of isolatedbiomarkers can be used in methods to distinguish healthy individuals andindividuals afflicted with chronic pancreatitis from early stage PDAC.

In additional embodiments, the panel of isolated biomarkcrs comprises atleast two of the isolated biomarkers selected from the group consistingof activated leukocyte cell adhesion molecule (ALCAM), BCL2-associatedathanogene 3 (BAG3), basigin (BSG), soluble hyaluronic acid (HA), DJ-1protein (PARK7), proteoglycan 4 (PRG4), SPARC-like 1 (SPARCL1)(hevin),and transforming growth factor, beta-induced, 68 kDa (TGFBI). In relatedembodiments, the panels of isolated biomarkers can further comprisebasigin (BSG).

In some embodiments, the panel of isolated biomarkers comprises one ormore peptides comprising a fragment of a biomarker selected fromBCL2-associated athanogene 3 (BAG3), cancer antigen 19-9 (CA 19-9),carcinoembryonic antigen-related cell adhesion molecule 5 (CEACAM5),soluble hyaluronic acid (HA), insulin-like growth factor binding protein2 (IGFBP2), DJ-1 protein (PARK7), and secreted phosphoprotein 1 (SPP1).In further embodiments, the panel of isolated biomarkers can be used inmethods to distinguish healthy individuals from early stage PDAC.

In additional embodiments, the panel of isolated biomarkers comprisesone or more peptides comprising a fragment of a biomarker selected fromBCL2-associated athanogene 3 (BAG3), cancer antigen 19-9 (CA 19-9),carcinoembryonic antigen-related cell adhesion molecule 5 (CEACAM5),epithelial cell adhesion molecule (EPCAM), lipocalin 2 (LCN2),mesothelin (MSLN), DJ-1 protein (PARK7), proteoglycan 4 (PRG4), secretedphosphoprotein 1 (SPP1), and tumor necrosis factor receptor superfamily,member 1A (TNFRSF1A). In further embodiments, the panel of isolatedbiomarkers can be used in methods to distinguish healthy individuals andindividuals afflicted with chronic pancreatitis from early stage PDAC.

In additional embodiments, the panel of isolated biomarkers comprisesone or more peptides comprising a fragment of a biomarker selected fromactivated leukocyte cell adhesion molecule (ALCAM), BCL2-associatedathanogene 3 (BAG3), basigin (BSG), soluble hyaluronic acid (HA), DJ-1protein (PARK7), proteoglycan 4 (PRG4), SPARC-like 1 (SPARCL1)(hevin),and transforming growth factor, beta-induced, 68 kDa (TGFBI). In relatedembodiments, the panels of isolated biomarkers can further comprisebasigin (BSG).

In further embodiments, the biomarker panel comprises at least two ofthe isolated biomarkers selected from the group consisting of ofmiR-100a, miR-1290, miR-155, miR-18a, miR-196a, miR-21, miR-210,miR-221, miR-24, miR-31, miR-375, and miR-885.

In additional embodiments, the biomarker panel comprises at least two ofthe isolated biomarkers selected from the group consisting of KRAS,SMAD4, CDKN2A and TP53.

In additional embodiments, the biomarker panel comprises at least two ofthe isolated biomarkers selected from the group consisting of activatedleukocyte cell adhesion molecule (ALCAM), angiogenin (ANG), AXL receptortyrosine kinase (AXL), alpha-2-glycoprotein 1, zinc-binding (AZGP1),BCL2-associated athanogene 3 (BAG3), basigin (BSG) (EMMPRIN, CD147),cancer antigen 19-9 (CA 19-9), carcinoembryonic antigen-related celladhesion molecule 5 (CEACAM5) (CEA), carcinoembryonic antigen-relatedcell adhesion molecule 1 (CEACAM1)(biliary glycoprotein), collagen, typeXVIII, alpha 1 (endostatin) (COL18A1), epithelial cell adhesion molecule(EPCAM), gelsolin (GSN), soluble hyaluronic acid (HA), haptoglobin (HP),intercellular adhesion molecule 1 (ICAM1), insulin-like growth factorbinding protein 2 (IGFBP2), insulin-like growth factor binding protein 4(IGFBP4), lipocalin 2 (LCN2) (NGAL), LIM and senescent cell antigen-likedomains 1 (LIMS1) (PINCH), leucine-rich alpha-2-glycoprotein 1 (LRG1),lactoferrin (LTF), matrix metallopeptidase 11 (MMP11) (stromelysin 3),matrix metallopeptidase 2 (MMP2) (gelatinase A, 72 kDa gelatinase, 72kDa type IV collagenase), matrix metallopeptidase 7 (MMP7) (matrilysin,uterine), matrix metallopeptidase 9 (MMP9) (gelatinase B, 92 kDagelatinase, 92 kDa type IV collagenase), mesothelin (MSLN), DJ-1 protein(PARK7), platelet factor 4 (PF4), plectin (PLEC), platelet basic protein(PPBP), proteoglycan 4 (PRG4), serum amyloid A (SAA), SPARC-like 1(SPARCL1)(hevin) secreted phosphoprotein 1 (SPP1) (osteopontin,OPN),transforming growth factor, beta-induced, 68 kDa (TGFBI), thrombospondin1 (THBS1), TIMP metallopeptidase inhibitor 1 (TIMP1), tumor necrosisfactor receptor superfamily, member 1A (TNFRSF1A), vascular endothelialgrowth factor C (VEGFC), miR-100a, miR-1290, miR-155, miR-18a, miR-196a,miR-21, miR-210, miR-221, miR-24, miR-31, miR-375, miR-885, KRAS, SMAD4,CDKN2A and TP53.

In one embodiment, the present disclosure includes methods fordetermining a probability for early stage pancreatic ductaladenocarcinoma (PDAC) in an individual, the method comprising detectinga measurable feature of each of N biomarkers selected from thebiomarkers listed in Tables 1 through 4 in a biological sample obtainedfrom said individual, and analyzing said measurable feature to determinethe probability for early stage PDAC in the individual.

In some embodiments, the methods of the invention can be used todetermine the probability for survival of an individual diagnosed withPDAC.

In some embodiments, the methods of the invention can be used toprognose the length of survival of an individual diagnosed with PDAC.

In additional embodiments, the methods of the invention can be used todetermine the probability for disease recurrence in an individualdiagnosed with PDAC, chronic pancreatitis, early stage PDAC, late stagePDAC, and other periampullary diseases.

In one embodiment, the methods of the invention can be used to determinea probability of a positive response to therapy for pancreatic ductaladenocarcinoma (PDAC) in an individual.

Other features and advantages of the invention will be apparent from thedetailed description, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows results from plasma BSG determinations by ELISA. Therelative distribution of BSG in the different groups is shown in thestrip chart (Panel A). Each data point represents an individual case.Mean plasma levels of BSG were significantly elevated in the PDAC stageI-II group relative to the three other groups (P<0.005, only significantcomparisons by ANOVA and Tukey-Kramer tests). Subject demographics areshown in Panel B.

FIG. 2 shows dose-dependent enhancement of PDAC cell proliferation indirect co-culture with fibroblasts. The effect of co-culture withpancreas cancer-associated fibroblasts (PCAF2) on RFP-MIA PaCa-2 growthwas monitored by RFP fluorescence. Numbers below the data bars representinitial plating densities for each cell type. RFP fluorescence intensitywas measured 3 hours after initial plating to allow for adherence (Day0) and then daily for three days. Relative growth is presented as theDay 3/Day 0 RFP fluorescence ratio relative to the Day 3/Day 0 RFPfluorescence ratio of RFP-MIA PaCa-2 cells alone (open bar). Relativegrowth was monitored for a constant number if initially plated RFP-MIAPaCa-2 cells co-cultured with an increasing number of either PCAF2(Panel A). The effect of co-culture with PCAF2 on RFP-MIA PaCa-2 cellgrowth was also monitored while keeping the initial total cell platingdensity constant (Panel B). Data represents the combined results of 3-4independent experiments. For each experiment, 4-6 replicate wells foreach condition were performed in parallel. Bracket indicates conditionssignificantly elevated relative to RFP-MIA PaCa-2 cells alone by ANOVAand Fisher's PLSD post-hoc test (P<0.01).

FIG. 3 shows stage-specific survival for PDAC. Data was abstracted fromthe SEER Research Database (2004-2010). The majority of cases are latestage, with highest survival seen for those cases in which the tumor isconfined to the pancreas and less than 2 cm in its greatest dimension(Stage IA).

FIG. 4 shows interpretation of test results at various levels of testaccuracy for diagnosis of PDAC. Outcomes are shown for a population of100 million individuals (the current approximate number of individualsover the age of 50 in the United States) and assuming an annual diseaseprevalence of 4 in 10,000. FN=False-negative test results;FP=false-positive test results; TN=true-negative test results;TP=true-positive test results. Positive predictive value=probability ofPDAC in a patient with a positive test result=TP/(TP+FP). Negativepredictive value=probability of no PDAC in a patient with a negativetest result=TN/(TN+FN). *$2.6 billion annual cost to treat. T$5.5billion annual cost for a single contrast-enhanced computed tomography(CT) follow-up screen for each false positive determination (based on2011 Medicare technical and professional reimbursement rate of $554/CT).

FIG. 5 shows relative distributions for PDAC diagnostic biomarkers.Levels of CA 19-9, haptoglobin (Hp), MMP-7, osteopontin (OPN), andTIMP-1 demonstrate considerable overlap in serum from healthy controlsubjects (CON), chronic pancreatitis patients (ChPT) and pancreaticductal adenocarcinoma (PDAC) patients. Each data point represents thebiomarker level for an individual sample. Note that serum CA 19-9 levelsare presented using a logarithmic scale. Green horizontal lines indicatethe 95% specificity threshold for the individual biomarkers.

FIG. 6 shows characteristics of biomarker panels with 99% specificity.The modeled sensitivity of biomarker panels is given for panelsconsisting of between 10 and 100 individual biomarkers. The three panelsrepresent different assumptions about the individual biomarkers: Chart Aindicates panel characteristics assuming all biomarkers in the panelyield 45% sensitivity at the 95% specificity threshold (e.g. TIMP-1),Chart B indicates panel characteristics assuming all biomarkers in thepanel yield 32% sensitivity at the 95% specificity threshold (theaverage of the 9 biomarkers examined), and chart C indicates panelcharacteristics assuming all biomarkers in the panel yield 19%sensitivity at the 95% specificity threshold (e.g. haptoglobin).Different assumptions about the mean correlation ratios betweenindividual biomarkers constituting the panels are indicated by the colorlegend. Numbers above the data points indicate the minimum number ofbiomarkers in the panels required to be above the 95% specificitythreshold in order to make a positive diagnosis of PDAC.

FIG. 7 shows the diagnostic performance of the SuperLearner compared toCA 19-9 alone for distinguishing healthy control subjects from earlystage PDAC cases. Receiver operating characteristic curves are shown forthe CON vs. PDAC SuperLearner applied to the actual data (red squares),the cross-validated SuperLearner data (blue diamonds), and bootstrapdata for CA 19-9 alone (green triangles). These data indicate that theSuperLearner will have improved performance over CA 19-9 alone whenapplied to additional samples

FIG. 8 shows receiver operating characteristic curves for panels derivedby feature selection lasso procedure for discriminating healthy controlsand early stage PDAC cases (Panel A) and for discriminating both healthycontrols and chronic pancreatitis cases from early stage PDAC cases(Panel B).

FIG. 9 shows boxplots indicating the comparison of serum analyte levelsbetween healthy control subjects (CON, N=60), chronic pancreatitis cases(ChPT, N=60) and early stage pancreatic cancer cases (PDAC, N=60).Analytes are listed alphabetically and the P-value for the groupcomparison is indicated in bold in each plot (Kruskal-Wallis one-wayanalysis of variance by ranks).

FIG. 10 shows representative cascade plots for the threshold votingmethod used to identify diagnostic analyte panels. Each data point (tickmark) on the x-axis represents a unique sample. Individual analytes wereassigned thresholds at the 95% percentile (95% specificity) for healthycontrols (panel A) or the 95% percentile for healthy controls togetherwith chronic pancreatitis cases (panel B). The numbers of analytes abovethe individual thresholds were then tabulated for all cases, includingearly stage PDAC cases. The number of votes (analytes above threshold)required to indicate the presence of disease (positive test) were chosento yield high-specificity panels and the resulting sensitivitycalculated. CON=healthy control subjects (N=60); ChPT=chronicpancreatitis subjects (N=60); PDAC=early stage pancreatic adenocarcinomasubjects (N=60).

DETAILED DESCRIPTION

The present disclosure is based, in part, on the discovery that certainprotein and/or nucleic acid biomarkers can be combined into panels thatare reliable prognostic and diagnostic tools for PDAC. The disclosureprovides biomarker panels, methods and kits for determining theprobability for PDAC in an individual. One major advantage of thepresent disclosure is that the combination of biomarkers into a panelconfers a high level of specificity and sensitivity on the prognosticand diagnostic methods described herein. The present invention is ofparticular benefit to asymptomatic populations of individuals with PDACthat would evade detection with traditional screening methods.

The present invention is further based, in part, on the discovery ofbasigin (BSG) as a biomarker for detection of early stage PDAC that uponbeing combined with additional biomarkers results in a strong classifierpanel with unexpectedly high specificity and sensitivity. In addition,the differential levels of BSG in healthy control, chronic pancreatitis,early stage PDAC and late stage PDAC cases disclosed herein make it apowerful biomarker for related methods and applications.

Basigin (BSG), a cell surface glycoprotein, plays a role in theinteraction of carcinoma and mesenchymal cells through the stimulationof extracellular matrix remodeling enzymes. Circulating BSG has alsobeen demonstrated as a diagnostic/prognostic biomarker for numerouscancers. Our preliminary evidence indicates that plasma BSG issignificantly elevated in early stage PDAC cases (stage I and II)compared to healthy controls, chronic pancreatitis cases, and late stagePDAC cases (stage III and IV), which had mean levels comparable tohealthy control subjects. These results suggest that circulating BSG isan early stage phenomenon during PDAC development and the first earlystagespecific biomarker identified for PDAC.

In one embodiment, the present disclosure includes methods fordetermining a probability for early stage pancreatic ductaladenocarcinoma (PDAC) in an individual, the method comprising detectinga measurable feature of each of N biomarkers selected from thebiomarkers listed in Tables 1 through 4 in a biological sample obtainedfrom said individual, and analyzing said measurable feature to determinethe probability for early stage PDAC in the individual.

In another embodiment, the present disclosure includes methods fordetermining a probability of recurrence of pancreatic ductaladenocarcinoma (PDAC) in an individual, the method comprising detectinga measurable feature of each of N biomarkers selected from thebiomarkers listed in Tables 1 through 4 in a biological sample obtainedfrom said individual, and analyzing said measurable feature to determinethe probability of recurrence of PDAC in the individual.

In a further embodiment, the present disclosure includes methods forprognosing length of survival of an individual diagnosed with pancreaticductal adenocarcinoma (PDAC), the method comprising detecting ameasurable feature of each of N biomarkers selected from the biomarkerslisted in Tables 1 through 4 in a biological sample obtained from saidindividual, and analyzing said measurable feature to prognose length ofsurvival for the individual with PDAC.

In one embodiment, the present disclosure includes a method ofdetermining a probability of a positive response to therapy forpancreatic ductal adenocarcinoma (PDAC) in an individual, the methodcomprising detecting a measurable feature of each of N biomarkersselected from the biomarkers listed in Tables 1 through 4 in abiological sample obtained from said individual, and analyzing saidmeasurable feature to determine the probability of a positive responseto therapy for PDAC in said individual. In related embodiments, thetherapy is selected from the group consisting of prophylacticanticoagulants, resection, neoadjuvant chemotherapy and chemoradiation.

In a further embodiment, the present disclosure includes a method ofselecting a therapy for pancreatic ductal adenocarcinoma (PDAC) in anindividual, the method comprising detecting a measurable feature of eachof N biomarkers selected from the biomarkers listed in Tables 1 through4 in a biological sample obtained from said individual, and analyzingsaid measurable feature to determine the probability of a positiveresponse to therapy for PDAC in said individual. In related embodiments,the therapy is selected from the group consisting of prophylacticanticoagulants, resection, neoadjuvant chemotherapy and chemoradiation.

In one embodiment, the present disclosure includes methods fordetermining a probability for early stage pancreatic ductaladenocarcinoma (PDAC) in an individual, the method comprising detectinga measurable feature of each of N biomarkers selected from thebiomarkers listed in one of Tables 1, 2, 3, or 4 in a biological sampleobtained from said individual, and analyzing said measurable feature todetermine the probability for early stage PDAC in the individual.

In another embodiment, the present disclosure includes methods fordetermining a probability of recurrence of pancreatic ductaladenocarcinoma (PDAC) in an individual, the method comprising detectinga measurable feature of each of N biomarkers selected from thebiomarkers listed in one of Tables 1, 2, 3, or 4 in a biological sampleobtained from said individual, and analyzing said measurable feature todetermine the probability of recurrence of PDAC in the individual.

In a further embodiment, the present disclosure includes methods forprognosing length of survival of an individual diagnosed with pancreaticductal adenocarcinoma (PDAC), the method comprising detecting ameasurable feature of each of N biomarkers selected from the biomarkerslisted in one of Tables 1, 2, 3, or 4 in a biological sample obtainedfrom said individual, and analyzing said measurable feature to prognoselength of survival for the individual with PDAC.

In one embodiment, the present disclosure includes a method ofdetermining a probability of a positive response to therapy forpancreatic ductal adenocarcinoma (PDAC) in an individual, the methodcomprising detecting a measurable feature of each of N biomarkersselected from the biomarkers listed in one of Tables 1, 2, 3, or 4 in abiological sample obtained from said individual, and analyzing saidmeasurable feature to determine the probability of a positive responseto therapy for PDAC in said individual. In related embodiments, thetherapy is selected from the group consisting of prophylacticanticoagulants, resection, neoadjuvant chemotherapy and chemoradiation.

In a further embodiment, the present disclosure includes a method ofselecting a therapy for pancreatic ductal adenocarcinoma (PDAC) in anindividual, the method comprising detecting a measurable feature of eachof N biomarkers selected from the biomarkers listed in one of Tables 1,2, 3, or 4 in a biological sample obtained from said individual, andanalyzing said measurable feature to determine the probability of apositive response to therapy for PDAC in said individual. In relatedembodiments, the therapy is selected from the group consisting ofprophylactic anticoagulants, resection, neoadjuvant chemotherapy andchemoradiation.

In addition to the specific biomarkers identified in this disclosure,for example, by accession number in a public database, sequence, orreference, the invention also contemplates use of biomarker variantsthat are at least 90% or at least 95% or at least 97% identical to theexemplified sequences and that are now known or later discovered andthat have utility for the methods of the invention. These variants mayrepresent polymorphisms, splice variants, mutations, and the like. Inthis regard, the instant specification discloses multiple art-knownproteins in the context of the invention and provides exemplaryaccession numbers associated with one or more public databases as wellas exemplary references to published journal articles relating to theseart-known proteins. However, those skilled in the art appreciate thatadditional accession numbers and journal articles can easily beidentified that can provide additional characteristics of the disclosedbiomarkers and that the exemplified references are in no way limitingwith regard to the disclosed biomarkers. As described herein, varioustechniques and reagents find use in the methods of the presentinvention. Suitable samples in the context of the present inventioninclude, for example, blood, plasma, serum, amniotic fluid, vaginalsecretions, saliva, and urine. In some embodiments, the biologicalsample is selected from the group consisting of whole blood, plasma, andserum. In a particular embodiment, the biological sample is scrum. Asdescribed herein, biomarkers can be detected through a variety of assaysand techniques known in the art. As further described herein, suchassays include, without limitation, nucleic acid sequencing, polymerasechain reaction (PCR), mass spectrometry (MS)-based assays,antibody-based assays as well as assays that combine aspects of the two.

Protein biomarkers associated with the probability for PDAC in anindividual include, but are not limited to, one or more of the isolatedbiomarkers listed in Tables 1 through 4. In addition to the specificbiomarkers, the disclosure further includes biomarker variants that areabout 90%, about 95%, or about 97% identical to the exemplifiedsequences. Variants, as used herein, include polymorphisms, splicevariants, mutations, and the like.

Additional markers can be selected from one or more risk indicia,including but not limited to, age, obesity (body mass index), history ofsmoking/tobacco use, history of chronic pancreatitis, history ofdiabetes, family history of pancreatic cancer, history of intraductalpapillary mucinous neoplasm or pancreatic intraepithelial neoplasia,history of mutations in cyclin-dependent kinase inhibitor 2A (CDKN2A),breast cancer 1, early onset BRCA1), breast cancer 2, early onset(BRCA2), serine/threonine kinase 11 (STK11), mutS homolog 2, coloncancer, nonpolyposis type 1 (E. coli) (MSH2), mutL homolog 1, coloncancer, nonpolyposis type 2 (E. coli) (MLH1), adenomatous polyposis coli(APC), partner and localizer of BRCA2 (PALB2), protease, serine, 1(trypsin 1) (PRSS1), serine peptidase inhibitor, Kazal type 1 (SPINK1)genes. Additional risk indicia useful for as markers can be identifiedusing learning algorithms known in the art, such as linear discriminantanalysis, support vector machine classification, recursive featureelimination, prediction analysis of microarray, logistic regression,CART, FlexTree, LART, random forest, MART, and/or survival analysisregression, which are known to those of skill in the art and are furtherdescribed herein.

Provided herein are panels of isolated biomarkers comprising N of thebiomarkers selected from the group listed in Tables 1 through 4. In thedisclosed panels of biomarkers N can be a number selected from the groupconsisting of, for example, 2-6, 2-7, 2-8, 2-9, 2-10, 2-11, 2-12, 2 to24. In the disclosed methods, the number of biomarkers that are detectedand whose levels are determined, can be 1, or more than 1, such as 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25 or more. In certain embodiments, the number of biomarkersthat are detected, and whose levels are determined, can be 1, or morethan 1, such as 2, 3, 4, 5, 6, 7, 8, 9, 10, or more. The methods of thisdisclosure are useful for determining the probability for PDAC in anindividual.

While certain of the biomarkers listed in Tables 1 through 4 are usefulalone for determining the probability for PDAC in an individual, methodsare also described herein for the grouping of multiple subsets of thebiomarkers that are each useful as a panel of three or more biomarkers.In some embodiments, the invention provides panels comprising Nbiomarkers, wherein N is at least three biomarkers. In otherembodiments, N is selected to be any number from 3-23 biomarkers.

In yet other embodiments, N is selected to be any number from 2-5, 2-10,2-15, 2-20, or 2-23. In other embodiments, N is selected to be anynumber from 3-5, 3-10, 3-15, 3-20, or 3-23. In other embodiments, N isselected to be any number from 4-5, 4-10, 4-15, 4-20, or 4-23. In otherembodiments, N is selected to be any number from 5-10, 5-15, 5-20, or5-23. In other embodiments, N is selected to be any number from 6-10,6-15, 6-20, or 6-23. In other embodiments, N is selected to be anynumber from 7-10, 7-15, 7-20, or 7-23. In other embodiments, N isselected to be any number from 8-10, 8-15, 8-20, or 8-23. In otherembodiments, N is selected to be any number from 9-10, 9-15, 9-20, or9-23. In other embodiments, N is selected to be any number from 10-15,10-20, or 10-23. It will be appreciated that N can be selected toencompass similar, but higher order, ranges.

In some embodiments, the panel of isolated biomarkers comprises one ormore, two or more, three or more, four or more, or five isolatedbiomarkers selected from activated leukocyte cell adhesion molecule(ALCAM), angiogenin (ANG), AXL receptor tyrosine kinase (AXL),BCL2-associated athanogene 3 (BAG3), basigin (BSG, EMMPRIN, CD147),cancer antigen 19-9 (CA 19-9), carcinoembryonic antigen-related celladhesion molecule 1 (biliary glycoprotein) (CEACAM1), collagen, typeXVIII, alpha 1 (endostatin) (COL18A1), epithelial cell adhesion molecule(EPCAM), soluble hyaluronic acid (HA), haptoglobin (HP), intercellularadhesion molecule 1 (ICAM1), insulin-like growth factor binding protein2 (IGFBP2), insulin-like growth factor binding protein 4 (IGFBP4),lipocalin 2 (LCN2, NGAL), leucine-rich alpha-2-glycoprotein 1 (LRG1),matrix metallopeptidase 2 (MMP2, gelatinase A, 72 kDa gelatinase, 72 kDatype IV collagenase), matrix metallopeptidase 7 (MMP7, matrilysin,uterine), matrix metallopeptidase 9 (MMP9, gelatinase B, 92 kDagelatinase, 92 kDa type IV collagenase), mesothelin (MSLN), DJ-1 protein(PARK7), platelet basic protein (PPBP), proteoglycan 4 (PRG4),SPARC-like 1 (SPARCL1, hcvin), secreted phosphoprotein 1 (SPP1,osteopontin, OPN), transforming growth factor, beta-induced, 68 kDa(TGFBI), thrombospondin 1 (THBS1), TIMP metallopeptidase inhibitor 1(TIMP1), tumor necrosis factor receptor superfamily, member 1A(TNFRSF1A), vascular endothelial growth factor C (VEGFC).

In certain embodiments, the panel of isolated biomarkers comprises oneor more, two or more, three or more, four or more, or five isolatedbiomarkers selected from activated leukocyte cell adhesion molecule(ALCAM), angiogenin (ANG), AXL receptor tyrosine kinase (AXL),alpha-2-glycoprotein 1, zinc-binding (AZGP1), BCL2-associated athanogene3 (BAG3), basigin (BSG) (EMMPRIN, CD147), cancer antigen 19-9 (CA 19-9),carcinoembryonic antigen-related cell adhesion molecule 5 (CEACAM5)(CEA), carcinoembryonic antigen-related cell adhesion molecule 1(CEACAM1) (biliary glycoprotein), collagen, type XVIII, alpha 1(endostatin) (COL18A1), epithelial cell adhesion molecule (EPCAM),gelsolin (GSN), soluble hyaluronic acid (HA), haptoglobin (HP),intercellular adhesion molecule 1 (ICAM1), insulin-like growth factorbinding protein 2 (IGFBP2), insulin-like growth factor binding protein 4(IGFBP4), lipocalin 2 (LCN2) (NGAL), LIM and senescent cell antigen-likedomains 1 (LIMS1) (PINCH), leucine-rich alpha-2-glycoprotein 1 (LRG1),lactoferrin (LTF), matrix metallopeptidase 11 (MMP11) (stromelysin 3),matrix metallopeptidase 2 (MMP2) (gelatinase A, 72 kDa gelatinase, 72kDa type IV collagenase), matrix metallopeptidase 7 (MMP7) (matrilysin,uterine), matrix metallopeptidase 9 (MMP9) (gelatinase B, 92 kDagelatinase, 92 kDa type IV collagenase), mesothelin (MSLN), DJ-1 protein(PARK7), platelet factor 4 (PF4), plectin (PLEC), platelet basic protein(PPBP), proteoglycan 4 (PRG4), serum amyloid A (SAA), SPARC-like 1(SPARCL1)(hevin), secreted phosphoprotein 1 (SPP1) (osteopontin,OPN),transforming growth factor, beta-induced, 68 kDa (TGFBI), thrombospondin1 (THBS1), TIMP metallopeptidase inhibitor 1 (TIMP1), tumor necrosisfactor receptor superfamily, member 1A (TNFRSF1A), and vascularendothelial growth factor C (VEGFC).

In some embodiments, the panel of isolated biomarkers comprises one ormore, two or more, three or more, four or more, or five isolatedbiomarkers selected from miR-100a, miR-1290, miR-155, miR-18a, miR-196a,miR-21, miR-210, miR-221, miR-24, miR-31, miR-375, and miR-885.

In additional embodiments, the biomarker panel comprises one or more,two or more, three or more, four or more, or five isolated biomarkersselected from the group consisting of KRAS, SMAD4, CDKN2A and TP53.

In some embodiments, the panel of isolated biomarkers comprises one ormore, two or more, or three of the isolated biomarkers consisting of thebiomarkers set forth in Tables 1 through 4.

In some embodiments, the panel of isolated biomarkers comprises one ormore peptides comprising a fragment of a biomarker selected fromactivated leukocyte cell adhesion molecule (ALCAM), angiogenin (ANG),AXL receptor tyrosine kinase (AXL), BCL2-associated athanogene 3 (BAG3),basigin (BSG, EMMPRIN, CD147), cancer antigen 19-9 (CA 19-9),carcinoembryonic antigen-related cell adhesion molecule 1 (biliaryglycoprotein) (CEACAM1), collagen, type XVIII, alpha 1 (endostatin)(COL18A1), epithelial cell adhesion molecule (EPCAM), soluble hyaluronicacid (HA), haptoglobin (HP), intercellular adhesion molecule 1 (ICAM1),insulin-like growth factor binding protein 2 (IGFBP2), insulin-likegrowth factor binding protein 4 (IGFBP4), lipocalin 2 (LCN2, NGAL),leucine-rich alpha-2-glycoprotein 1 (LRG1), matrix metallopeptidase 2(MMP2, gelatinase A, 72 kDa gelatinase, 72 kDa type IV collagenase),matrix metallopeptidase 7 (MMP7, matrilysin, uterine), matrixmetallopeptidase 9 (MMP9, gelatinase B, 92 kDa gelatinase, 92 kDa typeIV collagenase), mesothelin (MSLN), DJ-1 protein (PARK7), platelet basicprotein (PPBP), proteoglycan 4 (PRG4), SPARC-like 1 (SPARCL1, hevin),secreted phosphoprotein 1 (SPP1, osteopontin, OPN), transforming growthfactor, beta-induced, 68 kDa (TGFBI), thrombospondin 1 (THBS1), TIMPmetallopeptidase inhibitor 1 (TIMP1), tumor necrosis factor receptorsuperfamily, member 1A (TNFRSF1A), vascular endothelial growth factor C(VEGFC).

In some embodiments, the panel of isolated biomarkers comprises one ormore peptides comprising a fragment of a biomarker selected fromactivated leukocyte cell adhesion molecule (ALCAM), angiogenin (ANG),AXL receptor tyrosine kinase (AXL), alpha-2-glycoprotein 1, zinc-binding(AZGP1), BCL2-associated athanogene 3 (BAG3), basigin (BSG) (EMMPRIN,CD147), cancer antigen 19-9 (CA 19-9), carcinoembryonic antigen-relatedcell adhesion molecule 5 (CEACAM5) (CEA), carcinoembryonicantigen-related cell adhesion molecule 1 (CEACAM1) (biliaryglycoprotein), collagen, type XVIII, alpha 1 (endostatin) (COL18A1),epithelial cell adhesion molecule (EPCAM), gelsolin (GSN), solublehyaluronic acid (HA), haptoglobin (HP), intercellular adhesion molecule1 (ICAM1), insulin-like growth factor binding protein 2 (IGFBP2),insulin-like growth factor binding protein 4 (IGFBP4), lipocalin 2(LCN2) (NGAL), LIM and senescent cell antigen-like domains 1 (LIMS1)(PINCH), leucine-rich alpha-2-glycoprotein 1 (LRG1), lactoferrin (LTF),matrix metallopeptidase 11 (MMP11) (stromelysin 3), matrixmetallopeptidase 2 (MMP2) (gelatinase A, 72 kDa gelatinase, 72 kDa typeIV collagenase), matrix metallopeptidase 7 (MMP7) (matrilysin, uterine),matrix metallopeptidase 9 (MMP9) (gelatinase B, 92 kDa gelatinase, 92kDa type IV collagenase), mesothelin (MSLN), DJ-1 protein (PARK7),platelet factor 4 (PF4), plectin (PLEC), platelet basic protein (PPBP),proteoglycan 4 (PRG4), serum amyloid A (SAA), SPARC-like 1(SPARCL1)(hevin), secreted phosphoprotein 1 (SPP1) (osteopontin,OPN),transforming growth factor, beta-induced, 68 kDa (TGFBI), thrombospondin1 (THBS1), TIMP metallopeptidase inhibitor 1 (TIMP1), tumor necrosisfactor receptor superfamily, member 1A (TNFRSF1A), and vascularendothelial growth factor C (VEGFC).

The peptides and fragments of the invention can be isolated, syntheticor otherwise markedly different in structure, function, properties orcharacteristics from their naturally occurring counterparts.

As exemplified hercin, accurate diagnostic algorithms can be devisedusing subsets of the biomarker panels described herein. In someembodiments, the panel of isolated biomarkers comprises one or morepeptides comprising a fragment of a biomarker selected fromBCL2-associated athanogene 3 (BAG3), cancer antigen 19-9 (CA 19-9),carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1),soluble hyaluronic acid (HA), insulin-like growth factor binding protein2 (IGFBP2), DJ-1 protein (PARK7), and secreted phosphoprotein 1 (SPP1).In further embodiments, the panel of isolated biomarkers can be used inmethods to distinguish healthy individuals from early stage PDAC.

In certain embodiments, the panel of isolated biomarkers comprises oneor more, two or more, three or more, four or more, or five isolatedbiomarkers selected from BCL2-associated athanogene 3 (BAG3), cancerantigen 19-9 (CA 19-9), carcinoembryonic antigen-related cell adhesionmolecule 1 (CEACAM1), epithelial cell adhesion molecule (EPCAM),lipocalin 2 (LCN2), mesothelin (MSLN), DJ-1 protein (PARK7),proteoglycan 4 (PRG4), secreted phosphoprotein 1 (SPP1), and tumornecrosis factor receptor superfamily, member 1A (TNFRSF1A). In furtherembodiments, the panel of isolated biomarkers can be used in methods todistinguish healthy individuals and individuals afflicted with chronicpancreatitis from early stage PDAC.

In certain embodiments, the panel of isolated biomarkers comprises oneor more, two or more, three or more, four or more, or five isolatedbiomarkers selected from activated leukocyte cell adhesion molecule(ALCAM), BCL2-associated athanogene 3 (BAG3), basigin (BSG), solublehyaluronic acid (HA), DJ-1 protein (PARK7), proteoglycan 4 (PRG4),SPARC-like 1 (SPARCL1)(hevin), and transforming growth factor,beta-induced, 68 kDa (TGFBI). In related embodiments, the panels ofisolated biomarkers can further comprise basigin (BSG).

In additional embodiments, the panel of isolated biomarkers comprisesone or more peptides comprising a fragment of a biomarker selected fromBCL2-associated athanogene 3 (BAG3), cancer antigen 19-9 (CA 19-9),carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1),epithelial cell adhesion molecule (EPCAM), lipocalin 2 (LCN2),mesothelin (MSLN), DJ-1 protein (PARK7), proteoglycan 4 (PRG4), secretedphosphoprotein 1 (SPP1), and tumor necrosis factor receptor superfamily,member 1A (TNFRSF1A). In further embodiments, the panel of isolatedbiomarkers can be used in methods to distinguish healthy individuals andindividuals afflicted with chronic pancreatitis from early stage PDAC.

In additional embodiments, the panel of isolated biomarkers comprisesone or more peptides comprising a fragment of a biomarker selected fromactivated leukocyte cell adhesion molecule (ALCAM), BCL2-associatedathanogene 3 (BAG3), basigin (BSG), soluble hyaluronic acid (HA), DJ-1protein (PARK7), proteoglycan 4 (PRG4), SPARC-like 1 (SPARCL1)(hevin),and transforming growth factor, beta-induced, 68 kDa (TGFBI). In relatedembodiments, the panels of isolated biomarkers can further comprisebasigin (BSG).

In some embodiments, the panel of isolated biomarkers comprises one ormore nucleic acids comprising a fragment of a biomarker selected frommiR-100a, miR-1290, miR-155, miR-18a, miR-196a, miR-21, miR-210,miR-221, miR-24, miR-31, miR-375, and miR-885.

In additional embodiments, the biomarker panel comprises one or morenucleic acids comprising a fragment of a biomarker selected from KRAS,SMAD4, CDKN2A and TP53.

It must be noted that, as used in this specification and the appendedclaims, the singular forms “a”, “an” and “the” include plural referentsunless the content clearly dictates otherwise. Thus, for example,reference to “a biomarker” includes a mixture of two or more biomarkers,and the like.

The term “about,” particularly in reference to a given quantity, ismeant to encompass deviations of plus or minus five percent.

As used in this application, including the appended claims, the singularforms “a,” “an,” and “the” include plural references, unless the contentclearly dictates otherwise, and are used interchangeably with “at leastone” and “one or more.”

As used herein, the terms “comprises,” “comprising,” “includes,”“including,” “contains,” “containing,” and any variations thereof, areintended to cover a non-exclusive inclusion, such that a process,method, product-by-process, or composition of matter that comprises,includes, or contains an element or list of elements does not includeonly those elements but can include other elements not expressly listedor inherent to such process, method, product-by-process, or compositionof matter.

As used herein, the term “panel” refers to a composition, such as anarray or a collection, comprising one or more biomarkers. The term canalso refer to a profile or index of expression patterns of one or morebiomarkers described herein. The number of biomarkers useful for abiomarker panel is based on the sensitivity and specificity value forthe particular combination of biomarker values. In one embodiment, thenumber of biomarkers useful for a biomarker subset or panel is based onthe sensitivity and specificity value for the particular combination ofbiomarker values. The terms “sensitivity” and “specificity” are usedherein with respect to the ability to correctly classify an individual,based on one or more biomarker values detected in their biologicalsample, as having versus not having one or more of PDAC, chronicpancreatitis, early stage PDAC, late stage PDAC, and other pancreaticand periampullary diseases. Sensitivity indicates the performance of thebiomarker(s) with respect to correctly classifying individuals that havepancreatic disease, while specificity indicates the performance of thebiomarker(s) with respect to correctly classifying individuals who donot have pancreatic disease. For example, 85% specificity and 90%sensitivity for a panel of markers used to test a set of control samplesand pancreatic cancer samples indicates that 85% of the control sampleswere correctly classified as control samples by the panel, and 90% ofthe pancreatic cancer samples were correctly classified as pancreaticcancer samples by the panel. The biomarkers identified herein representa relatively large number of choices for subsets or panels of biomarkersthat can be used to effectively detect or diagnose early stage PDAC.Selection of the desired number of such biomarkers depends on thespecific combination of biomarkers chosen.

As used herein, and unless otherwise specified, the terms “isolated” and“purified” generally describes a composition of matter that has beenremoved from its native environment (e.g., the natural environment if itis naturally occurring), and thus is altered by the hand of man from itsnatural state. An isolated protein or nucleic acid is distinct from theway it exists in nature.

The term “biomarker” refers to a biological molecule, or a fragment of abiological molecule, the change and/or the detection of which can becorrelated with a particular physical condition or state. A change in abiomarker that can be correlated with a particular physical condition orstate, for example, pancreatic cancer or pancreatic disease can bequantitative, ie. a change in the level that is detected, orqualitative, i.e. a mutation that is detected. The terms “marker,”“analyte,” and “biomarker” are used interchangeably throughout thedisclosure. For example, the biomarkers of the present invention arecorrelated with an increased likelihood of one or more of PDAC, chronicpancreatitis, early stage PDAC, late stage PDAC, and other pancreaticand periampullary diseases. Such biomarkers include, but are not limitedto, biological molecules comprising nucleotides, nucleic acids,nucleosides, amino acids, sugars, fatty acids, steroids, metabolites,peptides, polypeptides, proteins, carbohydrates, lipids, hormones,antibodies, regions of interest that serve as surrogates for biologicalmacromolecules and combinations thereof (e.g., glycoproteins,ribonucleoproteins, lipoproteins). The term encompasses portions orfragments of a biological molecule, for example, a peptide fragment of aprotein or polypeptide that comprises at least 5 consecutive amino acidresidues, at least 6 consecutive amino acid residues, at least 7consecutive amino acid residues, at least 8 consecutive amino acidresidues, at least 9 consecutive amino acid residues, at least 10consecutive amino acid residues, at least 11 consecutive amino acidresidues, at least 12 consecutive amino acid residues, at least 13consecutive amino acid residues, at least 14 consecutive amino acidresidues, at least 15 consecutive amino acid residues, at least 5consecutive amino acid residues, at least 16 consecutive amino acidresidues, at least 17 consecutive amino acid residues, at least 18consecutive amino acid residues, at least 19 consecutive amino acidresidues, at least 20 consecutive amino acid residues, at least 21consecutive amino acid residues, at least 22 consecutive amino acidresidues, at least 23 consecutive amino acid residues, at least 24consecutive amino acid residues, at least 25 consecutive amino acidresidues, or more consecutive amino acid residues. Similarly, the termalso encompasses portions or fragments of a nucleic acid molecule, forexample, fragments of a gene.

As used herein, “polypeptide,” “peptide,” and “protein” are usedinterchangeably herein to refer to polymers of amino acids of anylength. The polymer may be linear or branched, it may comprise modifiedamino acids, and it may be interrupted by non-amino acids. The termsalso encompass an amino acid polymer that has been modified naturally orby intervention; for example, disulfide bond formation, glycosylation,lipidation, acetylation, phosphorylation, or any other manipulation ormodification, such as conjugation with a labeling component. Alsoincluded within the definition are, for example, polypeptides containingone or more analogs of an amino acid (including, for example, unnaturalamino acids, etc.), as well as other modifications known in the art.Polypeptides can be single chains or associated chains. Also includedwithin the definition are preproteins and intact mature proteins;peptides or polypeptides derived from a mature protein; fragments of aprotein; splice variants; recombinant forms of a protein; proteinvariants with amino acid modifications, deletions, or substitutions;digests; and post-translational modifications, such as glycosylation,acetylation, phosphorylation, and the like. The peptides and fragmentsof the invention can be isolated, synthetic or otherwise markedlydifferent in structure, function, properties or characteristics fromtheir naturally occurring counterparts.

In some embodiments described herein, one or more biomarkers belongs toa class of small RNAs referred to as microRNAs (miRNAs). First describedin Caenorhabditis elegans in 1993, 1 miRNAs were subsequently found tobe conserved in various plant and animal species in 2000 and soonemerged as potentially useful diagnostic and prognostic markers incancer. Unlike mRNA, miRNAs are only 19-25 nucleotides in size and donot encode amino-acid sequences. In contrast to the vast diversity ofmRNA transcripts, the number of miRNA species is much smaller, with 851human miRNAs reported to date (December 2007, miRBase, SangerInstitute). Despite this limited complexity, miRNAs have been shown tobe of profound biological importance, negatively regulating geneexpression at the post-transcriptional level. It is estimated based onsequence complementality that each miRNA can potentially bind tohundreds of different mRNA species. Such binding, which occurs mostly atthe 3′-untranslated regions of mRNA transcripts but may also occur atthe 5′-untranslated or the coding region, could lead to decreasedprotein expression of the target gene, either by follicularadenomacilitating degradation of the target mRNA or by suppression ofthe translational machinery. Through these negative regulatorymechanisms, miRNA have been shown to affect various biological processesin both normal and diseased states, including tumorigenesis in human.

The invention also provides a method of determining probability forearly stage PDAC in an individual, the method comprising detecting ameasurable feature of each of N biomarkers selected from the biomarkerslisted in Tables 1 through 4 in a biological sample obtained from theindividual, and analyzing the measurable feature to determine theprobability for early stage PDAC in the individual. As disclosed herein,a measurable feature comprises fragments or derivatives of each of saidN biomarkers selected from the biomarkers listed in Tables 1 through 4.In some embodiments of the disclosed methods detecting a measurablefeature comprises quantifying an amount of each of N biomarkers selectedfrom the biomarkers listed in Tables 1 through 4, combinations orportions and/or derivatives thereof in a biological sample obtained fromthe individual.

Although described and exemplified with reference to methods ofdetermining probability for early stage PDAC in an individual, thepresent disclosure is similarly applicable to methods of determiningprobability for PDAC, chronic pancreatitis, late stage PDAC, and otherperiampullary diseases. It will be apparent to one skilled in the artthat each of the aforementioned methods has specific and substantialutilities and benefits with regard to these related conditions.

In some embodiments, the method of determining probability for earlystage PDAC in an individual and related methods disclosed hereincomprise detecting a measurable feature of each of N biomarkers, whereinN is selected from the group consisting of 2 to 30. In furtherembodiments, the disclosed methods of determining probability for earlystage PDAC in an individual and related methods disclosed hereincomprise detecting a measurable feature of each of at least two isolatedbiomarkers selected from the group consisting of the biomarkers setforth in Tables 1 through 4. In further embodiments, the disclosedmethods of determining probability for early stage PDAC in an individualand related methods disclosed herein comprise detecting a measurablefeature of each of at least two isolated biomarkers selected from thegroup consisting of the biomarkers set forth in Table 1. In additionalembodiments, the disclosed methods of determining probability for earlystage PDAC in an individual and related methods disclosed hereincomprise detecting a measurable feature of each of at least two isolatedbiomarkers selected from the group consisting of the biomarkers selectedfrom the group consisting of the biomarkers set forth in Table 2. Infurther embodiments, the disclosed methods of determining probabilityfor early stage PDAC in an individual and related methods disclosedherein comprise detecting a measurable feature of each of at least twoisolated biomarkers selected from the group consisting of the biomarkersset forth in Table 3.

As used herein, “individual” refers to a test subject or patient. Theindividual can be a mammal or a non-mammal. In various embodiments, theindividual is a mammal. A mammalian individual can be a human ornon-human. In various embodiments, the individual is a human. A healthyor normal individual is an individual in which the disease or conditionof interest (including, for example, pancreatic diseases,pancreatic-associated diseases, or other pancreatic conditions) is notdetectable by conventional diagnostic methods.

As used herein, “diagnose”, “diagnosing”, “diagnosis”, and variationsthereof refer to the detection, determination, or recognition of ahealth status or condition of an individual on the basis of one or moresigns, symptoms, data, or other information pertaining to thatindividual. The health status of an individual can be diagnosed ashealthy/normal (i.e., a diagnosis of the absence of a disease orcondition) or diagnosed as ill/abnormal (i.e., a diagnosis of thepresence, or an assessment of the characteristics, of a disease orcondition). The terms “diagnose”, “diagnosing”, “diagnosis”, etc.,encompass, with respect to a particular disease or condition, theinitial detection of the disease; the characterization or classificationof the disease; the detection of the progression, remission, orrecurrence of the disease; and the detection of disease response afterthe administration of a treatment or therapy to the individual. Thediagnosis of pancreatic cancer includes distinguishing individuals whohave cancer from individuals who do not.

As used herein, “prognose”, “prognosing”, “prognosis”, and variationsthereof refer to the prediction of a future course of a disease orcondition in an individual who has the disease or condition (e.g.,predicting patient survival), and such terms encompass the evaluation ofdisease response after the administration of a treatment or therapy tothe individual.

In additional embodiments, the methods of determining probability forearly stage PDAC in an individual further encompass detecting ameasurable feature for one or more risk indicia associated with PDACincluding, for example, age, obesity (body mass index), history ofsmoking/tobacco use, history of chronic pancreatitis, history ofdiabetes, family history of pancreatic cancer, history of intraductalpapillary mucinous neoplasm or pancreatic intraepithelial neoplasia,history of mutations in cyclin-dependent kinase inhibitor 2A (CDKN2A),breast cancer 1, early onset BRCA1), breast cancer 2, early onset(BRCA2), serine/threonine kinase 11 (STK11), mutS homolog 2, coloncancer, nonpolyposis type 1 (E. coli) (MSH2), mutL homolog 1, coloncancer, nonpolyposis type 2 (E. coli) (MLH1), adenomatous polyposis coli(APC), partner and localizer of BRCA2 (PALB2), protease, serine, 1(trypsin 1) (PRSS1), serine peptidase inhibitor, Kazal type 1 (SPINK1)genes. In additional embodiments the risk indicia include, but are notlimited to, age, sex, race, diet, history of previous pancreatic cancer,presence of hereditary pancreatic cancer syndrome (e.g., BRCA2 mutation,familial atypical multiple mole melanoma, Peutz-Jeghers Syndrome,hereditary pancreatitis), genetic (e.g., familial pancreatic cancer)considerations, and environmental exposure. In some embodiments, theindividuals at risk for pancreatic cancer include, e.g., those having atleast 2 first-degree relatives who have experienced pancreatic cancerwithout accumulation of other cancers or familial diseases, and thosewhose risk is determined by analysis of genetic or biochemical markers(e.g., BRCA2, p16, STK11/LKB1, or PRSS1 gene). A “measurable feature” isany property, characteristic or aspect that can be determined andcorrelated with the probability for early stage PDAC in an individual.For a biomarker, such a measurable feature can include, for example, thepresence, absence, or concentration of the biomarker, or a fragmentthereof, in the biological sample, an altered structure, such as, forexample, the presence or amount of a post-translational modification,such as oxidation at one or more positions on the amino acid sequence ofthe biomarker or, for example, the presence of an altered conformationin comparison to the conformation of the biomarker in normal controlsubjects, and/or the presence, amount, or altered structure of thebiomarker as a part of a profile of more than one biomarker. In additionto biomarkers, measurable features can further include risk indiciaincluding, for example, age, sex, race, diet, history of previouspancreatic cancer, presence of hereditary pancreatic cancer syndrome(e.g., BRCA2 mutation, familial atypical multiple mole melanoma,Peutz-Jeghers Syndrome, hereditary pancreatitis), genetic (e.g.,familial pancreatic cancer) considerations, and environmental exposure.

In some embodiments of the disclosed methods of determining probabilityfor early stage PDAC in an individual is calculated based on thequantified amount of each of N biomarkers selected from the biomarkerslisted in Tables 1 through 4. In some embodiments, the disclosed methodsfor determining the probability of early stage PDAC encompass detectingand/or quantifying one or more biomarkers using mass sprectrometry, acapture agent or a combination thereof.

In some embodiments, the disclosed methods of determining probabilityfor early stage PDAC in an individual encompass an initial step ofproviding a biomarker panel comprising N of the biomarkers listed inTables 1 through 4. In additional embodiments, the disclosed methods ofdetermining probability for early stage PDAC in an individual encompassan initial step of providing a biological sample from the individual.

In some embodiments, the disclosed methods of determining probabilityfor PDAC in an individual encompass communicating the probability to ahealth care provider. As stated above, although described andexemplified with reference to determining probability for early stagePDAC in an individual, all embodiments described throughout thisdisclosure are similarly applicable to the methods of predicting earlystage PDAC in an individual. It will be apparent to one skilled in theart that each of the aforementioned methods have specific andsubstantial utilities and benefits with regard to these relatedconditions.

In additional embodiments, the communication informs a subsequenttreatment decision for the individual. In some embodiments, the methodof determining probability for early stage PDAC in an individualencompasses the additional feature of expressing the probability as arisk score.

As used herein, the term “risk score” refers to a score that can beassigned based on comparing the amount of one or more biomarkers in abiological sample obtained from a individual to a standard or referencescore that represents an average amount of the one or more biomarkerscalculated from biological samples obtained from a random pool ofindividuals. A standard or reference score can be predetermined andbuilt into a predictor model such that the comparison is indirect ratherthan actually performed every time the probability is determined for anindividual. A risk score can be a standard (e.g., a number) or athreshold (e.g., a line on a graph). The value of the risk scorecorrelates to the deviation, upwards or downwards, from the averageamount of the one or more biomarkers calculated from biological samplesobtained from a random pool of healthy individuals. In certainembodiments, if a risk score is greater than a standard or referencerisk score, the individual can have an increased likelihood of earlystage PDAC. In some embodiments, the magnitude of an individual's riskscore, or the amount by which it exceeds a reference risk score, can beindicative of or correlated to that individual's level of risk.

In the context of the present invention, the term “biological sample,”encompasses any sample that is taken from an individual and contains oneor more of the biomarkers listed in Tables 1 through 4. Suitable samplesin the context of the present invention include, for example, blood,plasma, serum, saliva, and urine. In some embodiments, the biologicalsample is selected from the group consisting of whole blood, plasma, andserum. In a particular embodiment, the biological sample is serum. Aswill be appreciated by those skilled in the art, a biological sample caninclude any fraction or component of blood, without limitation, T cells,monocytes, neutrophils, erythrocytes, platelets, circulating tumorcells, cell free DNA (cfDNA) and microvesicles such as exosomes andexosome-like vesicles

Pancreatic cancer is a malignant neoplasm of the pancreas. About 95% ofexocrine pancreatic cancers are pancreatic ductal adenocarcinomas(PDACs), also referred to as pancreatic adenocarcinomas (PAC). Theremaining 5% include adenosquamous carcinomas, signet ring cellcarcinomas, hepatoid carcinomas, colloid carcinomas, undifferentiatedcarcinomas, and undifferentiated carcinomas with osteoclast-like giantcells. Exocrine pancreatic tumors are far more common than pancreaticendocrine tumors, which make up about 1% of total cases.

In some embodiments, the pancreatic cancer is exocrine pancreatic canceror endocrine pancreatic cancer. The exocrine pancreatic cancer includes,but is not limited to, adenocarcinomas, acinar cell carcinomas,adenosquamous carcinomas, colloid carcinomas, undifferentiatedcarcinomas with osteoclast-like giant cells, hepatoid carcinomas,intraductal papillary-mucinous neoplasms, mucinous cystic neoplasms,pancreatoblastomas, serous cystadenomas, signet ring cell carcinomas,solid and pseuodpapillary tumors, pancreatic ductal carcinomas, andundifferentiated carcinomas. In some embodiments, the exocrinepancreatic cancer is pancreatic ductal carcinoma.

The endocrine pancreatic cancer includes, but is not limited to,insulinomas and glucagonomas.

In some embodiments, the pancreatic cancer is early stage pancreaticcancer, non-metastatic pancreatic cancer, primary pancreatic cancer,advanced pancreatic cancer, locally advanced pancreatic cancer,metastatic pancreatic cancer, unresectable pancreatic cancer, pancreaticcancer in remission, or recurrent pancreatic cancer. In someembodiments, the pancreatic cancer is locally advanced pancreaticcancer, unresectable pancreatic cancer, or metastatic pancreatic ductalcarcinoma. In some embodiments, the pancreatic cancer is resectable(i.e., tumors that arc confined to a portion of the pancreas or hasspread just beyond it that allows for complete surgical removal), orlocally advanced (unresectable) (i.e., the localized tumors may beunresectable because of local vessel impingement or invasion by tumor).In some embodiments, the pancreatic cancer is, according to AmericanJoint Committee on Cancer (AJCC) tumor-node-metastasis (TNM)classifications, a stage 0 tumor (the tumor is confined to the toplayers of pancreatic duct cells and has not invaded deeper tissues, andit has not spread outside of the pancreas (e.g., pancreatic carcinoma insitu or pancreatic intraepithelial neoplasia III), a stage IA tumor (thetumor is confined to the pancreas and is less than 2 cm in size, and ithas not spread to nearby lymph nodes or distinct sites), a stage IBtumor (the tumor is confined to the pancreas and is larger than 2 cm insize, and it has not spread to nearby lymph nodes or distant sites), astage IIA tumor (the tumor is growing outside the pancreas but not intolarge blood vessels, and it has not spread to nearby lymph nodes ordistant sites), stage IIB (the tumor is either confined to the pancreasor growing outside the pancreas but not into nearby large blood vesselsor major nerves, and it has spread to nearby lymph nodes but not distantsites), stage III (the tumor is growing outside the pancreas into nearbylarge blood vessels or major nerves, and it may or may not have spreadto nearby lymph nodes. It has not spread to distant sites) or stage IVtumor (the cancer has spread to distant sites).

Traditional intervention options that can be considered based on theperformance of the methods disclosed herein include, for example, aloneor in combination, prophylactic anticoagulants, resection with curativeintent, neoadjuvant chemotherapy or chemoradiation therapy, palliativeresection or biliary decompression, palliative chemotherapy, orpalliative chemoradiation therapy.

Surveillance options that can be considered in combination with theperformance of the methods disclosed herein include, for example, aloneor in combination, endoscopic ultrasound, endoscopic ultrasound-directedbiopsy, endoscopic retrograde cholangiopancreatography, endoscopicretrograde cholangiopancreatography-directed ductal brushings or biopsy,imaging by MRI or CT.

The term “amount” or “level” as used herein refers to a quantity of abiomarker that is detectable or measurable in a biological sample and/orcontrol. The quantity of a biomarker can be, for example, a quantity ofpolypeptide, the quantity of nucleic acid, or the quantity of a fragmentor surrogate. The term can alternatively include combinations thereof.The term “amount” or “level” of a biomarker is a measurable feature ofthat biomarker.

In some embodiments, calculating the probability for early stage PDAC inan individual is based on the quantified amount of each of N biomarkersselected from the biomarkers listed in Tables 1 through 4. Any existing,available or conventional separation, detection and quantificationmethods can be used herein to measure the presence or absence (e.g.,readout being present vs. absent; or detectable amount vs. undetectableamount) and/or quantity (e.g., readout being an absolute or relativequantity, such as, for example, absolute or relative concentration) ofbiomarkers, including nucleic acids, peptides, polypeptides, proteinsand/or fragments thereof and optionally of the one or more otherbiomarkers or fragments thereof in samples. In some embodiments,detection and/or quantification of one or more biomarkers comprises anassay that utilizes a capture agent. In further embodiments, the captureagent is an antibody, antibody fragment, nucleic acid-based proteinbinding reagent, small molecule or variant thereof. In additionalembodiments, the assay is an immunoassay such as enzyme immunoassay(EIA), enzyme-linked immunosorbent assay (ELISA), and radioimmunoassay(RIA). In some embodiments, detection and/or quantification of one ormore biomarkers further comprises mass spectrometry (MS). In yet furtherembodiments, the mass spectrometry is co-immunoprecipitation-massspectrometry (co-IP MS), where coimmunoprecipitation, a techniquesuitable for the isolation of whole protein complexes is followed bymass spectrometric analysis.

Generally, any mass spectrometric (MS) technique that can provideprecise information on the mass of peptides, and preferably also onfragmentation and/or (partial) amino acid sequence of selected peptides(e.g., in tandem mass spectrometry, MS/MS; or in post source decay, TOFMS), can be used in the methods disclosed herein. Suitable peptide MSand MS/MS techniques and systems are well-known per se (see, e.g.,Methods in Molecular Biology, vol. 146: “Mass Spectrometry of Proteinsand Peptides”, by Chapman, ed., Humana Press 2000; Biemann 1990. MethodsEnzymol 193: 455-79; or Methods in Enzymology, vol. 402: “BiologicalMass Spectrometry”, by Burlingame, ed., Academic Press 2005) and can beused in practicing the methods disclosed herein. Accordingly, in someembodiments, the disclosed methods comprise performing quantitative MSto measure one or more biomarkers. Such quantitative methods can beperformed in an automated (Villanueva, et al., Nature Protocols (2006)1(2):880-891) or semi-automated format. In particular embodiments, MScan be operably linked to a liquid chromatography device (LC-MS/MS orLC-MS) or gas chromatography device (GC-MS or GC-MS/MS). Other methodsuseful in this context include isotope-coded affinity tag (ICAT), tandemmass tags (TMT), or stable isotope labeling by amino acids in cellculture (SILAC), followed by chromatography and MS/MS.

As used herein, the terms “multiple reaction monitoring (MRM)” or“selected reaction monitoring (SRM)” refer to an MS-based quantificationmethod that is particularly useful for quantifying analytes that are inlow abundance. In an SRM experiment, a predefined precursor ion and oneor more of its fragments are selected by the two mass filters of atriple quadrupole instrument and monitored over time for precisequantification. Multiple SRM precursor and fragment ion pairs can bemeasured within the sa me experiment on the chromatographic time scaleby rapidly toggling between the different precursor/fragment pairs toperform an MRM experiment. A series of transitions (precursor/fragmention pairs) in combination with the retention time of the targetedanalyte (e.g., peptide or small molecule such as chemical entity,steroid, hormone) can constitute a definitive assay. A large number ofanalytes can be quantified during a single LC-MS experiment. The term“scheduled,” or “dynamic” in reference to MRM or SRM, refers to avariation of the assay wherein the transitions for a particular analyteare only acquired in a time window around the expected retention time,significantly increasing the number of analytes that can be detected andquantified in a single LC-MS experiment and contributing to theselectivity of the test, as retention time is a property dependent onthe physical nature of the analyte. A single analyte can also bemonitored with more than one transition. Finally, included in the assaycan be standards that correspond to the analytes of interest (e.g., sameamino acid sequence), but differ by the inclusion of stable isotopes.Stable isotopic standards (SIS) can be incorporated into the assay atprecise levels and used to quantify the corresponding unknown analyte.An additional level of specificity is contributed by the co-elution ofthe unknown analyte and its corresponding SIS and properties of theirtransitions (e.g., the similarity in the ratio of the level of twotransitions of the unknown and the ratio of the two transitions of itscorresponding SIS).

Mass spectrometry assays, instruments and systems suitable for biomarkerpeptide analysis can include, without limitation, matrix-assisted laserdesorption/ionisation time-of-flight (MALDI-TOF) MS; MALDI-TOFpost-source-decay (PSD); MALDI-TOF/TOF; surface-enhanced laserdesorption/ionization time-of-flight mass spectrometry (SELDI-TOF) MS;electrospray ionization mass spectrometry (ESI-MS); ESI-MS/MS;ESI-MS/(MS)_(n) (n is an integer greater than zero); ESI 3D or linear(2D) ion trap MS; ESI triple quadrupole MS; ESI quadrupole orthogonalTOF (Q-TOF); ESI Fourier transform MS systems; desorption/ionization onsilicon (DIOS); secondary ion mass spectrometry (SIMS); atmosphericpressure chemical ionization mass spectrometry (APCI-MS); APCI-MS/MS;APCI-(MS)_(n); ion mobility spectrometry (IMS); inductively coupledplasma mass spectrometry (ICP-MS) atmospheric pressure photoionizationmass spectrometry (APPI-MS); APPI-MS/MS; and APPI-(MS)_(n). Peptide ionfragmentation in tandem MS (MS/MS) arrangements can be achieved usingmanners established in the art, such as, e.g., collision induceddissociation (CID). As described herein, detection and quantification ofbiomarkers by mass spectrometry can involve multiple reaction monitoring(MRM), such as described among others by Kuhn et al. Proteomics 4:1175-86 (2004). Scheduled multiple-reaction-monitoring (Scheduled MRM)mode acquisition during LC-MS/MS analysis enhances the sensitivity andaccuracy of peptide quantitation. Anderson and Hunter, Molecular andCellular Proteomics 5(4):573 (2006). As described herein, massspectrometry-based assays can be advantageously combined with upstreampeptide or protein separation or fractionation methods, such as forexample with the chromatographic, immunoprecipitation, and other methodsdescribed herein below. As further described herein, shotgunquantitative proteomics can be combined with SRM/MRM-based assays forhigh-throughput identification and verification of prognostic biomarkersof early stage PDAC.

A person skilled in the art will appreciate that a number of methods canbe used to determine the amount of a biomarker, including massspectrometry approaches, such as MS/MS, LC-MS/MS, multiple reactionmonitoring (MRM) or SRM and product-ion monitoring (PIM) and alsoincluding antibody based methods such as immunoassays such as Westernblots, enzyme-linked immunosorbant assay (ELISA), immunoprecipitation,immunohistochemistry, immunofluorescence, radioimmunoassay, dotblotting, and FACS. Accordingly, in some embodiments, determining thelevel of the at least one biomarker comprises using an immunoassayand/or mass spectrometric methods. In additional embodiments, the massspectrometric methods are selected from MS, MS/MS, LC-MS/MS, SRM, PIM,and other such methods that are known in the art. In other embodiments,LC-MS/MS further comprises 1D LC-MS/MS, 2D LC-MS/MS or 3D LC-MS/MS.Immunoassay techniques and protocols are generally known to thoseskilled in the art (Price and Newman, Principles and Practice ofImmunoassay, 2nd Edition, Grove's Dictionaries, 1997; and Gosling,Immunoassays: A Practical Approach, Oxford University Press, 2000.) Avariety of immunoassay techniques, including competitive andnon-competitive immunoassays, can be used (Self et al., Curr. Opin.Biotechnol., 7:60-65 (1996).

In further embodiments, the immunoassay is selected from Western blot,ELISA, immunoprecipitation, immunohistochemistry, immunofluorescence,radioimmunoassay (RIA), dot blotting, and FACS. In certain embodiments,the immunoassay is an ELISA. In yet a further embodiment, the ELISA isdirect ELISA (enzyme-linked immunosorbent assay), indirect ELISA,sandwich ELISA, competitive ELISA, multiplex ELISA, ELISPOTtechnologies, and other similar techniques known in the art. Principlesof these immunoassay methods are known in the art, for example John R.Crowther, The ELISA Guidebook, 1st ed., Humana Press 2000, ISBN0896037282. Typically ELISAs are performed with antibodies but they canbe performed with any capture agents that bind specifically to one ormore biomarkers of the invention and that can be detected. MultiplexELISA allows simultaneous detection of two or more analytes within asingle compartment (e.g., microplate well) usually at a plurality ofarray addresses (Nielsen and Geierstanger 2004. J Immunol Methods 290:107-20 (2004) and Ling et al. 2007. Expert Rev Mol Diagn 7: 87-98(2007)).

In some embodiments, Radioimmunoassay (RIA) can be used to detect one ormore biomarkers in the methods of the invention. RIA is acompetition-based assay that is well known in the art and involvesmixing known quantities of radioactively-labelled (e.g., ¹²⁵I or¹³¹I-labelled) target analyte with antibody specific for the analyte,then adding non-labelled analyte from a sample and measuring the amountof labelled analyte that is displaced (see, e.g., An Introduction toRadioimmunoassay and Related Techniques, by Chard T, ed., ElsevierScience 1995, ISBN 0444821198 for guidance).

A detectable label can be used in the assays described herein for director indirect detection of the biomarkers in the methods of the invention.A wide variety of detectable labels can be used, with the choice oflabel depending on the sensitivity required, ease of conjugation withthe antibody, stability requirements, and available instrumentation anddisposal provisions. Those skilled in the art are familiar withselection of a suitable detectable label based on the assay detection ofthe biomarkers in the methods of the invention. Suitable detectablelabels include, but are not limited to, fluorescent dyes (e.g.,fluorescein, fluorescein isothiocyanate (FITC), Oregon Green™,rhodamine, Texas red, tetrarhodimine isothiocynate (TRITC), Cy3, Cy5,etc.), fluorescent markers (e.g., green fluorescent protein (GFP),phycoerythrin, etc.), enzymes (e.g., luciferase, horseradish peroxidase,alkaline phosphatase, etc.), nanoparticles, biotin, digoxigenin, metals,and the like.

For mass-sectrometry based analysis, differential tagging with isotopicreagents, e.g., isotope-coded affinity tags (ICAT) or the more recentvariation that uses isobaric tagging reagents, iTRAQ (AppliedBiosystems, Foster City, Calif.), or tandem mass tags, TMT, (ThermoScientific, Rockford, Ill.), followed by multidimensional liquidchromatography (LC) and tandem mass spectrometry (MS/MS) analysis canprovide a further methodology in practicing the methods of the inventon.

A chemiluminescence assay using a chemiluminescent antibody can be usedfor sensitive, non-radioactive detection of protein levels. An antibodylabeled with fluorochrome also can be suitable. Examples offluorochromes include, without limitation, DAPI, fluorescein, Hoechst33258, R-phycocyanin, B-phycoerythrin, R-phycoerythrin, rhodamine, Texasred, and lissamine. Indirect labels include various enzymes well knownin the art, such as horseradish peroxidase (HRP), alkaline phosphatase(AP), beta-galactosidase, urease, and the like. Detection systems usingsuitable substrates for horseradish-peroxidase, alkaline phosphatase,beta-galactosidase are well known in the art.

A signal from the direct or indirect label can be analyzed, for example,using a spectrophotometer to detect color from a chromogenic substrate;a radiation counter to detect radiation such as a gamma counter fordetection of ¹²⁵I; or a fluorometer to detect fluorescence in thepresence of light of a certain wavelength. For detection ofenzyme-linked antibodies, a quantitative analysis can be made using aspectrophotometer such as an EMAX Microplate Reader (Molecular Devices;Menlo Park, Calif.) in accordance with the manufacturer's instructions.If desired, assays used to practice the invention can be automated orperformed robotically, and the signal from multiple samples can bedetected simultaneously.

As described above, chromatography can also be used in practicing themethods of the invention. Chromatography encompasses methods forseparating chemical substances and generally involves a process in whicha mixture of analytes is carried by a moving stream of liquid or gas(“mobile phase”) and separated into components as a result ofdifferential distribution of the analytes as they flow around or over astationary liquid or solid phase (“stationary phase”), between themobile phase and said stationary phase. The stationary phase can beusually a finely divided solid, a sheet of filter material, or a thinfilm of a liquid on the surface of a solid, or the like. Chromatographyis well understood by those skilled in the art as a technique applicablefor the separation of chemical compounds of biological origin, such as,e.g., amino acids, proteins, fragments of proteins or peptides, etc.

Chromatography can be columnar (i.e., wherein the stationary phase isdeposited or packed in a column), preferably liquid chromatography, andyet more preferably high-performance liquid chromatography (HPLC), orultra high performance/pressure liquid chromatography (UHPLC).Particulars of chromatography are well known in the art (Bidlingmeyer,Practical HPLC Methodology and Applications, John Wiley & Sons Inc.,1993). Exemplary types of chromatography include, without limitation,high-performance liquid chromatography (HPLC), UHPLC, normal phase HPLC(NP-HPLC), reversed phase HPLC (RP-HPLC), ion exchange chromatography(IEC), such as cation or anion exchange chromatography, hydrophilicinteraction chromatography (HILIC), hydrophobic interactionchromatography (HIC), size exclusion chromatography (SEC) including gelfiltration chromatography or gel permeation chromatography,chromatofocusing, affinity chromatography such as immuno-affinity,immobilised metal affinity chromatography, and the like. Chromatography,including single-, two- or more-dimensional chromatography, can be usedas a peptide fractionation method in conjunction with a further peptideanalysis method, such as for example, with a downstream massspectrometry analysis as described elsewhere in this specification.

In the context of the invention, the term “capture agent” refers to acompound that can specifically bind to a target, in particular abiomarker. The term includes antibodies, antibody fragments, nucleicacid-based protein binding reagents (e.g. aptamers, Slow Off-rateModified Aptamers (SOMAmer™)), protein-capture agents, natural ligands(i.e. a hormone for its receptor or vice versa), small molecules orvariants thereof.

Capture agents can be configured to specifically bind to a target, inparticular a biomarker. Capture agents can include but are not limitedto organic molecules, such as polypeptides, polynucleotides and othernon polymeric molecules that are identifiable to a skilled person. Inthe embodiments disclosed herein, capture agents include any agent thatcan be used to detect, purify, isolate, or enrich a target, inparticular a biomarker. Any art-known affinity capture technologies canbe used to selectively isolate and enrich/concentrate biomarkers thatare components of complex mixtures of biological media for use in thedisclosed methods.

Antibody capture agents that specifically bind to a biomarker can beprepared using any suitable methods known in the art. See, e.g.,Coligan, Current Protocols in Immunology (1991); Harlow & Lane,Antibodies: A Laboratory Manual (1988); Goding, Monoclonal Antibodies:Principles and Practice (2d ed. 1986). Antibody capture agents can beany immunoglobulin or derivative thereof, whether natural or wholly orpartially synthetically produced. All derivatives thereof which maintainspecific binding ability are also included in the term. Antibody captureagents have a binding domain that is homologous or largely homologous toan immunoglobulin binding domain and can be derived from naturalsources, or partly or wholly synthetically produced. Antibody captureagents can be monoclonal or polyclonal antibodies. In some embodiments,an antibody is a single chain antibody. Those of ordinary skill in theart will appreciate that antibodies can be provided in any of a varietyof forms including, for example, humanized, partially humanized,chimeric, chimeric humanized, etc. Antibody capture agents can beantibody fragments including, but not limited to, Fab, Fab′, F(ab′)2,scFv, Fv, dsFv diabody, and Fd fragments. An antibody capture agent canbe produced by any means. For example, an antibody capture agent can beenzymatically or chemically produced by fragmentation of an intactantibody and/or it can be recombinantly produced from a gene encodingthe partial antibody sequence. An antibody capture agent can comprise asingle chain antibody fragment. Alternatively or additionally, antibodycapture agent can comprise multiple chains which are linked together,for example, by disulfide linkages; and, any functional fragmentsobtained from such molecules, wherein such fragments retainspecific-binding properties of the parent antibody molecule. Because oftheir smaller size as functional components of the whole molecule,antibody fragments can offer advantages over intact antibodies for usein certain immunochemical techniques and experimental applications.

Suitable capture agents useful for practicing the invention also includeaptamers. Aptamers are oligonucleotide sequences that can bind to theirtargets specifically via unique three dimensional (3-D) structures. Anaptamer can include any suitable number of nucleotides and differentaptamers can have either the same or different numbers of nucleotides.Aptamers can be DNA or RNA or chemically modified nucleic acids and canbe single stranded, double stranded, or contain double stranded regions,and can include higher ordered structures. An aptamer can also be aphotoaptamer, where a photoreactive or chemically reactive functionalgroup is included in the aptamer to allow it to be covalently linked toits corresponding target. Use of an aptamer capture agent can includethe use of two or more aptamers that specifically bind the samebiomarker. An aptamer can include a tag. An aptamer can be identifiedusing any known method, including the SELEX (systematic evolution ofligands by exponential enrichment), process. Once identified, an aptamercan be prepared or synthesized in accordance with any known method,including chemical synthetic methods and enzymatic synthetic methods andused in a variety of applications for biomarker detection. Liu et al.,Curr Med Chem. 18(27):4117-25 (2011). Capture agents useful inpracticing the methods of the invention also include SOMAmers (SlowOff-Rate Modified Aptamers) known in the art to have improved off-ratecharacteristics. Brody et al., J Mol Biol. 422(5):595-606 (2012).SOMAmers can be generated using any known method, including the SELEXmethod.

It is understood by those skilled in the art that biomarkers can bemodified prior to analysis to improve their resolution or to determinetheir identity. For example, the biomarkers can be subject toproteolytic digestion before analysis. Any protease can be used.Proteases, such as trypsin, that are likely to cleave the biomarkersinto a discrete number of fragments are particularly useful. Thefragments that result from digestion function as a fingerprint for thebiomarkers, thereby enabling their detection indirectly. This isparticularly useful where there are biomarkers with similar molecularmasses that might be confused for the biomarker in question. Also,proteolytic fragmentation is useful for high molecular weight biomarkersbecause smaller biomarkers are more easily resolved by massspectrometry. In another example, biomarkers can be modified to improvedetection resolution. For instance, neuraminidase can be used to removeterminal sialic acid residues from glycoproteins to improve binding toan anionic adsorbent and to improve detection resolution. In anotherexample, the biomarkers can be modified by the attachment of a tag ofparticular molecular weight that specifically binds to molecularbiomarkers, further distinguishing them. Optionally, after detectingsuch modified biomarkers, the identity of the biomarkers can be furtherdetermined by matching the physical and chemical characteristics of themodified biomarkers in a protein database (e.g., SwissProt).

It is further appreciated in the art that biomarkers in a sample can becaptured on a substrate for detection. Traditional substrates includeantibody-coated 96-well plates or nitrocellulose membranes that aresubsequently probed for the presence of the proteins. Alternatively,protein-binding molecules attached to microspheres, microparticles,microbeads, beads, or other particles can be used for capture anddetection of biomarkers. The protein-binding molecules can beantibodies, peptides, peptoids, aptamers, small molecule ligands orother protein-binding capture agents attached to the surface ofparticles. Each protein-binding molecule can include unique detectablelabel that is coded such that it can be distinguished from otherdetectable labels attached to other protein-binding molecules to allowdetection of biomarkers in multiplex assays. Examples include, but arenot limited to, color-coded microspheres with known fluorescent lightintensities (see e.g., microspheres with xMAP technology produced byLuminex (Austin, Tex.); microspheres containing quantum dotnanocrystals, for example, having different ratios and combinations ofquantum dot colors (e.g., Qdot nanocrystals produced by LifeTechnologies (Carlsbad, Calif.); glass coated metal nanoparticles (seee.g., SERS nanotags produced by Nanoplex Technologies, Inc. (MountainView, Calif.); barcode materials (see e.g., sub-micron sized stripedmetallic rods such as Nanobarcodes produced by Nanoplex Technologies,Inc.), encoded microparticles with colored bar codes (see e.g., CellCardproduced by Vitra Bioscience, vitrabio.com), glass microparticles withdigital holographic code images (see e.g., CyVera microbeads produced byIllumina (San Diego, Calif.); chemiluminescent dyes, combinations of dyecompounds; and beads of detectably different sizes.

In another aspect, biochips can be used for capture and detection of thebiomarkers of the invention. Many protein biochips are known in the art.These include, for example, protein biochips produced by PackardBioScience Company (Meriden Conn.), Zyomyx (Hayward, Calif.) and Phylos(Lexington, Mass.). In general, protein biochips comprise a substratehaving a surface. A capture reagent or adsorbent is attached to thesurface of the substrate. Frequently, the surface comprises a pluralityof addressable locations, each of which location has the capture agentbound there. The capture agent can be a biological molecule, such as apolypeptide or a nucleic acid, which captures other biomarkers in aspecific manner. Alternatively, the capture agent can be achromatographic material, such as an anion exchange material or ahydrophilic material. Examples of protein biochips are well known in theart.

Measuring mRNA in a biological sample can be used as a surrogate fordetection of the level of the corresponding protein biomarker in abiological sample. Thus, any of the biomarkers or biomarker panelsdescribed herein can also be detected by detecting the appropriate RNA.Levels of mRNA can measured by reverse transcription quantitativepolymerase chain reaction (RT-PCR followed with qPCR). RT-PCR is used tocreate a cDNA from the mRNA. The cDNA can be used in a qPCR assay toproduce fluorescence as the DNA amplification process progresses. Bycomparison to a standard curve, qPCR can produce an absolute measurementsuch as number of copies of mRNA per cell. Northern blots, microarrays,Invader assays, and RT-PCR combined with capillary electrophoresis haveall been used to measure expression levels of mRNA in a sample. See GeneExpression Profiling: Methods and Protocols, Richard A. Shimkets,editor, Humana Press, 2004.

As described herein, in some embodiments the biomarkers of the inventionencompass one or more RNAs. The RNA isolated from the sample may betotal RNA, mRNA, microRNA, tRNA, rRNA or any type of RNA. In particularembodiments, the biomarkers of the invention encompass one or moremiRNAs selected from the group consisting of miR-100a, miR-1290,miR-155, miR-18a, miR-196a, miR-21, miR-210, miR-221, miR-24, miR-31,miR-375, and miR-885.

Conventional methods and reagents for isolating RNA from a samplecomprise High Pure miRNA Isolation Kit (Roche), Trizol (Invitrogen),Guanidinium thiocyanate-phenol-chloroform extraction, PureLink™ miRNAisolation kit (Invitrogen), PureLink Micro-to-Midi Total RNAPurification System (invitrogen), RNeasy kit (Qiagen), miRNeasy kit(Qiagen), Oligotex kit (Qiagen), phenol extraction, phenol-chloroformextraction, TCA/acetone precipitation, ethanol precipitation, Columnpurification, Silica gel membrane purification, PureYield™ RNA Midiprep(Promega), PolyATtract System 1000 (Promega), Maxwell™ 16 System(Promega), SV Total RNA Isolation (Promega), gencMAG-RNA/DNA kit(Chemiccll), TRI Reagent™ (Ambion), RNAqueous Kit (Ambion), ToTALLY RNA™Kit (Ambion), Poly(A)Purist™ Kit (Ambion) and any other methods,commercially available or not, known to the skilled person. If thesample is a FFPE, the tissue sections are initially deparaffinised, suchas in xylene and ethanol.

The RNA may be further amplified, cleaned-up, concentrated, DNasetreated, quantified or otherwise analysed or examined such as by agarosegel electrophoresis, absorbance spectrometry or Bioanalyser analysis(Agilent) or subjected to any other post-extraction method known to theskilled person.

Methods for extracting and analysing an RNA sample are disclosed inMolecular Cloning, A Laboratory Manual (Sambrook and Russell (ed.),3^(rd) edition (2001), Cold Spring Harbor Laboratory Press, Cold SpringHarbor, N.Y., USA.

The isolated RNA may be analysed by quantitative (“real-time”') PCR(QPCR). In one embodiment, the expression level of one or more miRNAs isdetermined by the quantitative polymerase chain reaction (QPCR)technique. Real-time polymerase chain reaction, also called quantitativepolymerase chain reaction (Q-PCR/qPCR/RT-QPCR) or kinetic polymerasechain reaction, is a technique based on the polymerase chain reaction,which is used to amplify and simultaneously quantify a targeted DNAmolecule. It enables both detection and quantification (as absolutenumber of copies or relative amount when normalized to DNA input oradditional normalizing genes) of a specific sequence in a DNA sample.

The procedure follows the general principle of polymerase chainreaction; its key feature is that the amplified DNA is quantified as itaccumulates in the reaction in real time after each amplification cycle.Two common methods of quantification are the use of fluorescent dyesthat intercalate with double-stranded DNA, and modified DNAoligonucleotide probes that fluoresce when hybridized with acomplementary DNA. Frequently, real-time polymerase chain reaction iscombined with reverse transcription polymerase chain reaction toquantify low abundance messenger RNA (mRNA), or miRNA, enabling aresearcher to quantify relative gene expression at a particular time, orin a particular cell or tissue type.

In a real time PCR assay a positive reaction is detected by accumulationof a fluorescent signal. The Ct (cycle threshold) is defined as thenumber of cycles required for the fluorescent signal to cross thethreshold (i.e. exceeds background level). Ct levels are inverselyproportional to the amount of target nucleic acid in the sample (i.e.the lower the Ct level the greater the amount of target nucleic acid inthe sample). Most real time assays undergo 40 cycles of amplification.The QPCR may be performed using chemicals and/or machines from acommercially available platform.

The QPCR may be performed using QPCR machines from any commerciallyavailable platform; such as Prism, geneAmp or StepOne Real Time PCRsystems (Applied Biosystems), LightCycler (Roche), RapidCycler (IdahoTechnology), MasterCycler (Eppendorf), iCycler iQ system, Chromo 4system, CFX, MiniOpticon and Opticon systems (Bio-Rad), SmartCyclersystem (Cepheid), RotorGene system (Corbett Lifescience), MX3000 andMX3005 systems (Stratagene), DNA Engine Opticon system (Qiagen),Quantica qPCR systems (Techne), InSyte and Syncrom cycler system(BioGene), DT-322 (DNA Technology), Exicycler Notebook Thermal cycler,TL998 System (lanlong), Line-Gene-K systems (Bioer Technology), or anyother commercially available platform.

The QPCR may be performed using chemicals from any commerciallyavailable platform, such as NCode EXPRESS qPCR or EXPRESS qPCR(Invitrogen), Taqman or SYBR green qPCR systems (Applied Biosystems),Real-Time PCR reagents (Eurogentec), iTaq mix (Bio-Rad), qPCR mixes andkits (Biosense), and any other chemicals, commercially available or not,known to the skilled person. The QPCR reagents and detection system maybe probe-based, or may be based on chelating a fluorescent chemical intodouble-stranded oligonucleotides. The QPCR reaction may be performed ina tube; such as a single tube, a tube strip or a plate, or it may beperformed in a microfluidic card in which the relevant probes and/orprimers are already integrated.

A Microfluidic card allows high throughput, parallel analysis of mRNA ormiRNA expression patterns, and allows for a quick and cost-effectiveinvestigation of biological pathways. The microfluidic card may be apiece of plastic that is riddled with micro channels and chambers filledwith the appropriate probes. A sample in fluid form is injected into oneend of the card, and capillary action causes the fluid sample to bedistributed into the microchannels. The microfluidic card is then placedin an appropriate device for processing the card and reading the signal.

Any commercially available (predesigned or custom-made) microfluidiccard, for example, TaqMan™ Array Human MicroRNA A+B Cards V2.0 (AppliedBiosystems) can be used. Said microfluidic card may comprise a number ofprobes and/or primers for analysing the expression of a number ofmiRNAs, such as between 1-10 miRNAs, for example 10-20 miRNA, such asbetween 20-30 miRNAs, for example 30-40 miRNA, such as between 40-50miRNAs, for example 50-100 miRNA, such as between 100-200 miRNAs, forexample 200-300 miRNA, such as between 300-400 miRNAs, for example400-500 miRNA, such as between 500-1000 miRNAs.

The isolated RNA may be analysed by microarray analysis. In oneembodiment, the expression level of one or more miRNAs is determined bythe microarray technique. A microarray is a multiplex technology thatconsists of an arrayed series of thousands of microscopic spots of DNAoligonucleotides or antisense miRNA probes, called features, eachcontaining picomoles of a specific oligonucleotide sequence. This can bea short section of a gene or other DNA or RNA element that are used asprobes to hybridize a DNA or RNA sample (called target) underhigh-stringency conditions. Probe-target hybridization is usuallydetected and quantified by fluorescence-based detection offluorophore-labeled targets to determine relative abundance of nucleicacid sequences in the target. In standard microarrays, the probes areattached to a solid surface by a covalent bond to a chemical matrix (viaepoxy-silane, amino-silane, lysine, polyacrylamide or others). The solidsurface can be glass or a silicon chip, in which case they are commonlyknown as gene chip. DNA arrays are so named because they either measureDNA or use DNA as part of its detection system. The DNA probe mayhowever be a modified DNA structure such as LNA (locked nucleic acid).

In some embodiments, the microarray analysis is used to detect microRNA,known as microRNA or miRNA expression profiling. The microarray fordetection of microRNA may be a microarray platform, wherein the probesof the microarray may be comprised of antisense miRNAs or DNAoligonucleotides. In the first case, the target is a labelled sensemiRNA sequence, and in the latter case the miRNA has been reversetranscribed into cDNA and labelled. The microarray for detection ofmicroRNA may be a commercially available array platform, such as NCode™miRNA Microarray Expression Profiling (Invitrogen), miRCURY LNA™microRNA Arrays (Exiqon), microRNA Array (Agilent), pParaflo™Microfluidic Biochip Technology (LC Sciences), MicroRNA Profiling Panels(Illumina), Geniom™ Biochips (Febit Inc.), microRNA Array (Oxford GeneTechnology), Custom AdmiRNA™ profiling service (Applied BiologicalMaterials Inc.), microRNA Array (Dharmacon-Thermo Scientific), LDATaqMan analyses (Applied Biosystems), Taqman microRNA Array (AppliedBiosystems) or any other commercially available array. Microarrayanalysis may comprise all or a subset of the steps of RNA isolation, RNAamplification, reverse transcription, target labelling, hybridisationonto a microarray chip, image analysis and normalisation, and subsequentdata analysis; each of these steps may be performed according to amanufacturers protocol.

It follows, that any of the methods as disclosed herein above e.g. fordetermining the prognosis of an individual with pancreatic cancer mayfurther comprise one or more of the steps of: i) isolating miRNA from asample, ii) determining an expression profile of said miRNA in saidsample, wherein the miRNA molecules comprise the nucleic acids listed inTable 2, thereby determining the prognosis of pancreatic cancer in saidindividual. The expression profile can be generated using any methodknown in the art, including, without limitation, oligonucleotidemicroarrays, microRNA (miRNA) arrays, and high-throughput sequencing andhigh throughput quantitative polymerase chain reaction (qPCR).

One skilled in the art will appreciate that isolated RNA may be analyzedby any method known in the art including, for example, northern blottingor nuclease protection assay. Northern blotting combines denaturingagarose gel or polyacrylamide gel electrophoresis for size separation ofRNA with methods to transfer the size-separated RNA to a filter membranefor probe hybridization. The hybridization probe may be made from DNA orRNA. Nuclease protection assay is a technique used to identifyindividual RNA molecules in a heterogeneous RNA sample extracted fromcells. The technique can identify one or more RNA molecules of knownsequence even at low total concentration. The extracted RNA is firstmixed with antisense RNA or DNA probes that are complementary to thesequence or sequences of interest and the complementary strands arehybridized to form double-stranded RNA (or a DNA-RNA hybrid). Themixture is then exposed to ribonucleases that specifically cleave onlysingle-stranded RNA but have no activity against double-stranded RNA.When the reaction runs to completion, susceptible RNA regions aredegraded to very short oligomers or to individual nucleotides; thesurviving RNA fragments are those that were complementary to the addedantisense strand and thus contained the sequence of interest.

Some embodiments disclosed herein relate to diagnostic and prognosticmethods of determining the probability for PDAC, chronic pancreatitis,early stage PDAC, late stage PDAC, and other periampullary diseases. Thedetection of the level of expression of one or more biomarkers and/orthe determination of a ratio of biomarkers can be used to determine theprobability for each of PDAC, chronic pancreatitis, early stage PDAC,late stage PDAC, and other periampullary diseases. Such detectionmethods can be used, for example, for early diagnosis of the condition,to determine whether a subject is predisposed to PDAC, chronicpancreatitis, early stage PDAC, late stage PDAC, and other periampullarydiseases, to monitor the progress of PDAC, chronic pancreatitis, earlystage PDAC, late stage PDAC, and other periampullary diseases or theprogress of treatment protocols, to assess the severity of PDAC, chronicpancreatitis, early stage PDAC, late stage PDAC, and other periampullarydiseases, to forecast the outcome of PDAC, chronic pancreatitis, earlystage PDAC, late stage PDAC, and other periampullary diseases and/orprospects of recovery, or to aid in the determination of a suitabletreatment for PDAC, chronic pancreatitis, early stage PDAC, late stagePDAC, and other periampullary diseases. In some embodiments, the methodsdisclosed herein can be performed to determine the probability of aresponse to a particular treatment in an individual. In otherembodiments, the methods disclosed herein can be performed to determinethe probability that an individual develops venous thromboembolism. Suchsuitable treatments include, for example, resection, chemotherapy, andchemoradiation therapy.

In some embodiments disclosed herein, the methods of the invention canbe used to determine the probability for survival of an individualdiagnosed with of PDAC, chronic pancreatitis, early stage PDAC, latestage PDAC, and other periampullary diseases.

In some embodiments disclosed herein, the methods of the invention canbe used to determine the probability for disease recurrence in anindividual diagnosed with of PDAC, chronic pancreatitis, early stagePDAC, late stage PDAC, and other periampullary diseases.

The quantitation of biomarkers in a biological sample can be determined,without limitation, by the methods described above as well as any othermethod known in the art. The quantitative data thus obtained is thensubjected to an analytic classification process. In such a process, theraw data is manipulated according to an algorithm, where the algorithmhas been pre-defined by a training set of data, for example as describedin the examples provided herein. An algorithm can utilize the trainingset of data provided herein, or can utilize the guidelines providedherein to generate an algorithm with a different set of data.

In some embodiments, analyzing a measurable feature to determine theprobability of PDAC, chronic pancreatitis, early stage PDAC, late stagePDAC, and other periampullary diseases encompasses the use of apredictive model. In further embodiments, analyzing a measurable featureto determine the probability for PDAC, chronic pancreatitis, early stagePDAC, late stage PDAC, and other periampullary diseases encompassescomparing said measurable feature with a reference feature. As thoseskilled in the art can appreciate, such comparison can be a directcomparison to the reference feature or an indirect comparison where thereference feature has been incorporated into the predictive model. Infurther embodiments, analyzing a measurable feature to determine theprobability PDAC, chronic pancreatitis, early stage PDAC, late stagePDAC, and other periampullary diseases encompasses one or more of alinear discriminant analysis model, a support vector machineclassification algorithm, a recursive feature elimination model, aprediction analysis of microarray model, a logistic regression model, aCART algorithm, a flex tree algorithm, a LART algorithm, a random forestalgorithm, a MART algorithm, a machine learning algorithm, a penalizedregression method, or a combination thereof. In particular embodiments,the analysis comprises logistic regression.

An analytic classification process can use any one of a variety ofstatistical analytic methods to manipulate the quantitative data andprovide for classification of the sample. Examples of useful methodsinclude linear discriminant analysis, recursive feature elimination, aprediction analysis of microarray, a logistic regression, a CARTalgorithm, a FlexTree algorithm, a LART algorithm, a random forestalgorithm, a MART algorithm, machine learning algorithms; etc.

Classification can be made according to predictive modeling methods thatset a threshold for determining the probability that a sample belongs toa given class. The probability preferably is at least 50%, or at least60%, or at least 70%, or at least 80% or higher. Classifications alsocan be made by determining whether a comparison between an obtaineddataset and a reference dataset yields a statistically significantdifference. If so, then the sample from which the dataset was obtainedis classified as not belonging to the reference dataset class.Conversely, if such a comparison is not statistically significantlydifferent from the reference dataset, then the sample from which thedataset was obtained is classified as belonging to the reference datasetclass.

The predictive ability of a model can be evaluated according to itsability to provide a quality metric, e.g. AUROC (area under the ROCcurve) or accuracy, of a particular value, or range of values. Areaunder the curve measures are useful for comparing the accuracy of aclassifier across the complete data range. Classifiers with a greaterAUC have a greater capacity to classify unknowns correctly between twogroups of interest. In some embodiments, a desired quality threshold isa predictive model that will classify a sample with an accuracy of atleast about 0.5, at least about 0.55, at least about 0.6, at least about0.7, at least about 0.75, at least about 0.8, at least about 0.85, atleast about 0.9, at least about 0.95, or higher. As an alternativemeasure, a desired quality threshold can refer to a predictive modelthat will classify a sample with an AUC of at least about 0.7, at leastabout 0.75, at least about 0.8, at least about 0.85, at least about 0.9,or higher.

As is known in the art, the relative sensitivity and specificity of apredictive model can be adjusted to favor either the selectivity metricor the sensitivity metric, where the two metrics have an inverserelationship. The limits in a model as described above can be adjustedto provide a selected sensitivity or specificity level, depending on theparticular requirements of the test being performed. One or both ofsensitivity and specificity can be at least about 0.7, at least about0.75, at least about 0.8, at least about 0.85, at least about 0.9, orhigher.

The raw data can be initially analyzed by measuring the values for eachbiomarker, usually in triplicate or in multiple triplicates. The datacan be manipulated, for example, raw data can be transformed usingstandard curves, and the average of triplicate measurements used tocalculate the average and standard deviation for each patient. Thesevalues can be transformed before being used in the models, e.g.log-transformed, Box-Cox transformed (Box and Cox, Royal Stat. Soc.,Series B, 26:211-246(1964). The data are then input into a predictivemodel, which will classify the sample according to the state. Theresulting information can be communicated to a patient or health careprovider.

To generate a predictive model for PDAC, chronic pancreatitis, earlystage PDAC, late stage PDAC, and other periampullary diseases, a robustdata set, comprising known control samples and samples corresponding tothe classification of interest is used in a training set. A sample sizecan be selected using generally accepted criteria. As discussed above,different statistical methods can be used to obtain a highly accuratepredictive model.

In one embodiment, hierarchical clustering is performed in thederivation of a predictive model, where the Pearson correlation isemployed as the clustering metric. One approach is to consider a earlystage PDAC dataset as a “learning sample” in a problem of “supervisedlearning.” CART is a standard in applications to medicine (Singer,Recursive Partitioning in the Health Sciences, Springer (1999)) and canbe modified by transforming any qualitative features to quantitativefeatures; sorting them by attained significance levels, evaluated bysample reuse methods for Hotelling's T² statistic; and suitableapplication of the lasso method. Problems in prediction are turned intoproblems in regression without losing sight of prediction, indeed bymaking suitable use of the Gini criterion for classification inevaluating the quality of regressions.

This approach led to what is termed FlexTree (Huang, Proc. Nat. Acad.Sci. U.S.A 101:10529-10534(2004)). FlexTree performs very well insimulations and when applied to multiple forms of data and is useful forpracticing the claimed methods. Software automating FlexTree has beendeveloped. Alternatively, LARTree or LART can be used (Turnbull (2005)Classification Trees with Subset Analysis Selection by the Lasso,Stanford University). The name reflects binary trees, as in CART andFlexTree; the lasso, as has been noted; and the implementation of thelasso through what is termed LARS by Efron et al. (2004) Annals ofStatistics 32:407-451 (2004). See, also, Huang et al., Proc. Natl. Acad.Sci. USA. 101(29):10529-34 (2004). Other methods of analysis that can beused include logic regression. One method of logic regression Ruczinski,Journal of Computational and Graphical Statistics 12:475-512 (2003).Logic regression resembles CART in that its classifier can be displayedas a binary tree. It is different in that each node has Booleanstatements about features that are more general than the simple “and”statements produced by CART.

Another approach is that of nearest shrunken centroids (Tibshirani,Proc. Natl. Acad. Sci. U.S.A 99:6567-72(2002)). The technology isk-means-like, but has the advantage that by shrinking cluster centers,one automatically selects features, as is the case in the lasso, tofocus attention on small numbers of those that are informative. Theapproach is available as PAM software and is widely used. Two furthersets of algorithms that can be used are random forests (Breiman, MachineLearning 45:5-32 (2001)) and MART (Hastie, The Elements of StatisticalLearning, Springer (2001)). These two methods are known in the art as“committee methods,” that involve predictors that “vote” on outcome.

To provide significance ordering, the false discovery rate (FDR) can bedetermined. First, a set of null distributions of dissimilarity valuesis generated. In one embodiment, the values of observed profiles arepermuted to create a sequence of distributions of correlationcoefficients obtained out of chance, thereby creating an appropriate setof null distributions of correlation coefficients (Tusher et al., Proc.Natl. Acad. Sci. U.S.A 98, 5116-21 (2001)). The set of null distributionis obtained by: permuting the values of each profile for all availableprofiles; calculating the pair-wise correlation coefficients for allprofile; calculating the probability density function of the correlationcoefficients for this permutation; and repeating the procedure for Ntimes, where N is a large number, usually 300. Using the Ndistributions, one calculates an appropriate measure (mean, median,etc.) of the count of correlation coefficient values that their valuesexceed the value (of similarity) that is obtained from the distributionof experimentally observed similarity values at given significancelevel.

The FDR is the ratio of the number of the expected falsely significantcorrelations (estimated from the correlations greater than this selectedPearson correlation in the set of randomized data) to the number ofcorrelations greater than this selected Pearson correlation in theempirical data (significant correlations). This cut-off correlationvalue can be applied to the correlations between experimental profiles.Using the aforementioned distribution, a level of confidence is chosenfor significance. This is used to determine the lowest value of thecorrelation coefficient that exceeds the result that would have obtainedby chance. Using this method, one obtains thresholds for positivecorrelation, negative correlation or both. Using this threshold(s), theuser can filter the observed values of the pair wise correlationcoefficients and eliminate those that do not exceed the threshold(s).Furthermore, an estimate of the false positive rate can be obtained fora given threshold. For each of the individual “random correlation”distributions, one can find how many observations fall outside thethreshold range. This procedure provides a sequence of counts. The meanand the standard deviation of the sequence provide the average number ofpotential false positives and its standard deviation.

In addition the Cox models can be used, especially since reductions ofnumbers of covariates to manageable size with the lasso willsignificantly simplify the analysis, allowing the possibility of anonparametric or semi-parametric approach to prediction of time to earlystage PDAC. These statistical tools are known in the art and applicableto all manner of proteomic data. A set of biomarker, clinical andgenetic data that can be easily determined, and that is highlyinformative regarding the probability for early stage PDAC in anindividual is provided.

Accordingly, one skilled in the art understands that the probability forPDAC, chronic pancreatitis, early stage PDAC, late stage PDAC, and otherperiampullary diseases according to the invention can be determinedusing either a quantitative or a categorical variable. For example, inpracticing the methods of the invention the measurable feature of eachof N biomarkers can be subjected to categorical data analysis todetermine the probability for PDAC, chronic pancreatitis, early stagePDAC, late stage PDAC, and other periampullary diseases as a binarycategorical outcome. Alternatively, the methods of the invention mayanalyze the measurable feature of each of N biomarkers by initiallycalculating quantitative variables, in particular, predicted onset ofPDAC, chronic pancreatitis, early stage PDAC, late stage PDAC, and otherperiampullary diseases. The predicted onset of PDAC, chronicpancreatitis, early stage PDAC, late stage PDAC, or other periampullarydiseases can subsequently be used as a basis to predict risk of PDAC,chronic pancreatitis, early stage PDAC, late stage PDAC, or otherperiampullary diseases, respectively. By initially using a quantitativevariable and subsequently converting the quantitative variable into acategorical variable the methods of the invention take into account thecontinuum of measurements detected for the measurable features.

In the development of a predictive model, it can be desirable to selecta subset of markers, i.e. at least 3, at least 4, at least 5, at least6, up to the complete set of markers. Usually a subset of markers willbe chosen that provides for the needs of the quantitative sampleanalysis, e.g. availability of reagents, convenience of quantitation,etc., while maintaining a highly accurate predictive model. Theselection of a number of informative markers for building classificationmodels requires the definition of a performance metric and auser-defined threshold for producing a model with useful predictiveability based on this metric. For example, the performance metric can bethe AUC, the sensitivity and/or specificity of the prediction as well asthe overall accuracy of the prediction model.

As will be understood by those skilled in the art, an analyticclassification process can use any one of a variety of statisticalanalytic methods to manipulate the quantitative data and provide forclassification of the sample. Examples of useful methods include,without limitation, linear discriminant analysis, recursive featureelimination, a prediction analysis of microarray, a logistic regression,a CART algorithm, a FlexTree algorithm, a LART algorithm, a randomforest algorithm, a MART algorithm, and machine teaming algorithms.

The selection of a subset of markers can be for a forward selection or abackward selection of a marker subset. The number of markers can beselected that will optimize the performance of a model without the useof all the markers. One way to define the optimum number of terms is tochoose the number of terms that produce a model with desired predictiveability (e.g. an AUC>0.75, or equivalent measures ofsensitivity/specificity) that lies no more than one standard error fromthe maximum value obtained for this metric using any combination andnumber of terms used for the given algorithm. As exemplified herein,mathematical modeling of existing serum biomarker data, by allowing fordiverse responses between cases, allows for a biomarker panel to bedevised that has greater than 99% accuracy for diagnosis of a lowprevalence cancer, pancreatic ductal adenocarcinoma. The resultsdescribed in Example 5 provide a framework for identifying usefulbiomarker characteristics and minimizing biomarker correlation.

In yet another aspect, the invention provides kits for determiningprobability of PDAC, chronic pancreatitis, early stage PDAC, late stagePDAC, or other periampullary diseases, wherein the kits can be used todetect N of the isolated biomarkers listed in Tables 1 through 4. Forexample, the kits can be used to detect one or more, two or more, orthree of the isolated biomarkers selected from the group consisting ofthe biomarkers set forth in Tables 1 through 4. For example, the kitscan be used to detect one or more, two or more, or three of the isolatedbiomarkers selected from the group consisting of the biomarkers setforth in Tables 1 through 4.

In another aspect, the kits can be used to detect one or more, two ormore, three or more, four or more, five or more, six or more, seven ormore, or eight of the isolated biomarkers selected from the groupconsisting of the biomarkers set forth in Tables 1 through 4.

In another aspect, the kits can be used to detect one or more, two ormore, three or more, four or more, five or more, six or more, seven ormore, or eight of the isolated biomarkers selected from the groupconsisting of the biomarkers set forth in Tables 1 through 4.

The kit can include one or more agents for detection of biomarkers, acontainer for holding a biological sample isolated from an individual;and printed instructions for reacting agents with the biological sampleor a portion of the biological sample to detect the presence or amountof the isolated biomarkers in the biological sample. The agents can bepackaged in separate containers. The kit can further comprise one ormore control reference samples and reagents for performing animmunoassay.

In one embodiment, the kit comprises agents for measuring the levels ofat least N of the isolated biomarkers listed in Tables 1 through 4. Thekit can include antibodies that specifically bind to these biomarkers,for example, the kit can contain at least one of an antibody thatspecifically binds to a biomarker selected from the group listed inTable 1.

In one embodiment, the kit comprises agents for measuring the levels ofat least N of the isolated biomarkers listed in Tables 1 through 4. Thekit can include antibodies that specifically bind to these biomarkers,for example, the kit can contain at least one of an antibody thatspecifically binds to a biomarker selected from the group listed inTable 1.

The kit can comprise one or more containers for compositions containedin the kit. Compositions can be in liquid form or can be lyophilized.Suitable containers for the compositions include, for example, bottles,vials, syringes, and test tubes. Containers can be formed from a varietyof materials, including glass or plastic. The kit can also comprise apackage insert containing written instructions for methods ofdetermining probability of PDAC, chronic pancreatitis, early stage PDAC,late stage PDAC, and other periampullary diseases.

From the foregoing description, it will be apparent that variations andmodifications can be made to the invention described herein to adopt itto various usages and conditions. Such embodiments are also within thescope of the following claims.

The recitation of a listing of elements in any definition of a variableherein includes definitions of that variable as any single element orcombination (or subcombination) of listed elements. The recitation of anembodiment herein includes that embodiment as any single embodiment orin combination with any other embodiments or portions thereof.

TABLE 1 Protein Analytes Analyte Name ALCAM activated leukocyte celladhesion molecule ANG angiogenin AXL AXL receptor tyrosine kinase AZGP1alpha-2-glycoprotein 1, zinc-binding BAG3 BCL2-associated athanogene 3BSG basigin (EMMPRIN, CD147) CA 19-9 cancer antigen 19-9 CEACAM5carcinoembryonic antigen-related cell adhesion molecule 5 (CEA) CEACAM1carcinoembryonic antigen-related cell adhesion molecule 1 (biliaryglycoprotein) COL18A1 collagen, type XVIII, alpha 1 (endostatin) EPCAMepithelial cell adhesion molecule GSN gelsolin HA soluble hyaluronicacid HP haptoglobin ICAM1 intercellular adhesion molecule 1 IGFBP2insulin-like growth factor binding protein 2 IGFBP4 insulin-like growthfactor binding protein 4 LCN2 lipocalin 2 (NGAL) LIMS1 LIM and senescentcell antigen-like domains 1 (PINCH) LRG1 leucine-richalpha-2-glycoprotein 1 LTF lactoferrin MMP11 matrix metallopeptidase 11(stromelysin 3) MMP2 matrix metallopeptidase 2 (gelatinase A, 72 kDagelatinase, 72 kDa type IV collagenase) MMP7 matrix metallopeptidase 7(matrilysin, uterine) MMP9 matrix metallopeptidase 9 (gelatinase B, 92kDa gelatinase, 92 kDa type IV collagenase) MSLN mesothelin PARK7 DJ-1protein PF4 platelet factor 4 PLEC plectin PPBP platelet basic proteinPRG4 proteoglycan 4 SAA serum amyloid A SPARCL1 SPARC-like 1 (hevin)SPP1 secreted phosphoprotein 1 (osteopontin, OPN) TGFBI transforminggrowth factor, beta-induced, 68 kDa THBS1 thrombospondin 1 TIMP1 TIMPmetallopeptidase inhibitor 1 TNFRSF1A tumor necrosis factor receptorsuperfamily, member 1A VEGFC vascular endothelial growth factor C

TABLE 2 microRNA Analytes Analyte miR-100a miR-1290 miR-155 miR-18amiR-196a miR-21 miR-210 miR-221 miR-24 miR-31 miR-375 miR-885

TABLE 3 Genetic Lesion Analytes Gene KRAS (Gene ID: 3845) SMAD4 (GeneID: 4089) CDKN2A (Gene ID: 1029) TP53 (Gene ID: 7157)

TABLE 4 Analytes Tested by ELISA Analyte Identifier Analyte Name ALCAMactivated leukocyte cell adhesion molecule ANG angiogenin AXL AXLreceptor tyrosine kinase BAG3 BCL2-associated athanogene 3 BSG basigin(EMMPRIN, CD147) CA 19-9 cancer antigen 19-9 CEACAM1 carcinoembryonicantigen-related cell adhesion molecule 1 (biliary glycoprotein) COL18A1collagen, type XVIII, alpha 1 (endostatin) EPCAM epithelial celladhesion molecule HA soluble hyaluronic acid HP haptoglobin ICAM1intercellular adhesion molecule 1 IGFBP2 insulin-like growth factorbinding protein 2 IGFBP4 insulin-like growth factor binding protein 4LCN2 lipocalin 2 (NGAL) LRG1 leucine-rich alpha-2-glycoprotein 1 MMP2matrix metallopeptidase 2 (gelatinase A, 72 kDa gelatinase, 72 kDa typeIV collagenase) MMP7 matrix metallopeptidase 7 (matrilysin, uterine)MMP9 matrix metallopeptidase 9 (gelatinase B, 92 kDa gelatinase, 92 kDatype IV collagenase) MSLN mesothelin PARK7 DJ-1 protein PPBP plateletbasic protein PRG4 proteoglycan 4 SPARCL1 SPARC-like 1 (hevin) SPP1secreted phosphoprotein 1 (osteopontin, OPN) TGFBI transforming growthfactor, beta-induced, 68 kDa THBS1 thrombospondin 1 TIMP1 TIMPmetallopeptidase inhibitor 1 TNFRSF1A tumor necrosis factor receptorsuperfamily, member 1A VEGFC vascular endothelial growth factor C

TABLE 5 Individual Analyte Performance for Comparison of HealthyControls (CON), Chronic Pancreatitis Cases (ChPT), and Early-stagePancreatic Ductal Adenocarcinoma Cases (PDAC) Area Under ReceiverOperating Characteristic Curve (95% Confidence Interval) CON vs. PDACChPT vs. PDAC CON vs. ChPT CON + ChPT vs. PDAC ALCAM 0.775 (0.687-0.858)0.693 (0.591-0.793) 0.617 (0.516-0.723) 0.734 (0.649-0.816) ANG 0.536(0.433-0.636) 0.566 (0.467-0.669) 0.543 (0.436-0.649) 0.551(0.459-0.640) AXL 0.747 (0.657-0.833) 0.648 (0.551-0.745) 0.599(0.491-0.700) 0.697 (0.610-0.776) BAG3 0.538 (0.436-0.646) 0.578(0.475-0.678) 0.607 (0.505-0.706) 0.520 (0.431-0.609) BSG 0.553(0.444-0.661) 0.623 (0.516-0.719) 0.576 (0.476-0.680) 0.588(0.495-0.679) CA 19-9 0.858 (0.781-0.926) 0.785 (0.698-0.867) 0.641(0.532-0.736) 0.821 (0.737-0.896) CEACAM1 0.888 (0.826-0.943) 0.775(0.692-0.858) 0.668 (0.572-0.763) 0.831 (0.758-0.894) COL18A1 0.695(0.595-0.786) 0.549 (0.449-0.656) 0.623 (0.523-0.716) 0.622(0.536-0.707) EPCAM 0.508 (0.396-0.609) 0.672 (0.569-0.764) 0.704(0.608-0.799) 0.582 (0.491-0.672) HA 0.776 (0.688-0.854) 0.622(0.518-0.719) 0.683 (0.589-0.774) 0.699 (0.610-0.779) HP 0.614(0.512-0.713) 0.581 (0.479-0.689) 0.694 (0.600-0.785) 0.517(0.431-0.603) ICAM1 0.806 (0.716-0.882) 0.635 (0.530-0.735) 0.768(0.674-0.846) 0.721 (0.626-0.810) IGFBP2 0.733 (0.640-0.821) 0.495(0.394-0.602) 0.712 (0.619-0.798) 0.614 (0.534-0.696) IGFBP4 0.597(0.493-0.699) 0.556 (0.445-0.654) 0.648 (0.541-0.746) 0.521(0.426-0.607) LCN2 0.497 (0.391-0.599) 0.585 (0.483-0.686) 0.612(0.512-0.713) 0.541 (0.452-0.635) LRG1 0.469 (0.366-0.578) 0.501(0.400-0.603) 0.532 (0.424-0.630) 0.485 (0.397-0.571) MMP2 0.605(0.499-0.711) 0.552 (0.447-0.652) 0.664 (0.560-0.762) 0.526(0.433-0.614) MMP7 0.570 (0.469-0.675) 0.544 (0.443-0.652) 0.536(0.429-0.640) 0.557 (0.463-0.651) MMP9 0.524 (0.420-0.628) 0.545(0.441-0.654) 0.593 (0.486-0.692) 0.510 (0.420-0.601) MSLN 0.527(0.422-0.631) 0.464 (0.356-0.579) 0.488 (0.383-0.595) 0.532(0.444-0.619) PARK7 0.588 (0.484-0.691) 0.695 (0.595-0.783) 0.618(0.512-0.713) 0.642 (0.556-0.727) PPBP 0.550 (0.445-0.656) 0.549(0.443-0.655) 0.482 (0.383-0.588) 0.549 (0.457-0.643) PRG4 0.629(0.529-0.728) 0.524 (0.422-0.628) 0.610 (0.514-0.707) 0.577(0.491-0.664) SPARCL1 0.473 (0.366-0.578) 0.509 (0.406-0.616) 0.548(0.441-0.652) 0.491 (0.400-0.584) SPP1 0.792 (0.708-0.867) 0.589(0.481-0.693) 0.721 (0.624-0.809) 0.690 (0.609-0.769) TGFBI 0.694(0.591-0.786) 0.667 (0.570-0.766) 0.480 (0.377-0.581) 0.681(0.585-0.770) THBS1 0.571 (0.468-0.673) 0.590 (0.488-0.689) 0.662(0.564-0.755) 0.509 (0.417-0.603) TIMP1 0.686 (0.588-0.781) 0.532(0.428-0.638) 0.657 (0.558-0.749) 0.609 (0.516-0.699) TNFRSF1A 0.634(0.529-0.735) 0.523 (0.426-0.628) 0.659 (0.562-0.751) 0.555(0.462-0.651) VEGFC 0.519 (0.413-0.618) 0.491 (0.389-0.596) 0.516(0.410-0.620) 0.505 (0.411-0.593)

TABLE 6 Subject Characteristics No. Median Age Subject Class SubjectGroup cases (Range), Years Pancreatic Ductal Total 60 68.5 (49-92)Adenocarcinoma Female 26 71.5 (49-91) Male 34 66 (49-92) Stage IA 4 74.5(50-78) Stage IB 12 64.5 (50-92) Stage IIA 44 68.5 (49-91) HealthyControl Total 60 63 (45-81) Female 26 64.5 (62-79) Male 34 59 (45-81)Chronic Total 60 60.5 (47-86) Pancreatitis Female 26 58 (47-86) Male 3462.5 (53-83)

All patents and publications mentioned in this specification are hereinincorporated by reference to the same extent as if each independentpatent and publication was specifically and individually indicated to beincorporated by reference.

The following examples are provided by way of illustration, notlimitation.

EXAMPLES Example 1. Basigin is a Biomarker for Early Stage PDAC

BSG levels were measured by ELISA in plasma from 50 healthy controlsubjects (mean=3.06 ng/ml, 95% CI=2.6-3.5), 20 patients with chronicpancreatitis (3.92 ng/ml, 3.18-4.65 CI), and 50 pre-treatment samplesfrom patients with PDAC (4.74 ng/ml, 4.24-5.23 CI). Nonparametricanalysis revealed a significant difference for the model (P=0.0006,Kruskal-Wallis rank sums). By ANOVA and Tukey-Kramer post-hoc tests, thedistributions of BSG levels in PDAC and healthy control subjects weresignificantly different (P<0.0001), but the comparison of PDAC tochronic pancreatitis cases was not significant (P=0.195). However, astriking difference was noted after separating the PDAC cases into earlystage (I-II) and late stage (III-IV) groups (FIG. 1). ANOVA indicatedthat early stage disease (N=30, 5.56 ng/ml, 4.96-6.16 CI) hadsignificantly elevated BSG levels compared to healthy controls(P<0.0001), chronic pancreatitis cases (P=0.005), and late stage PDACcases (P=0.0002, n=20, 3.51 ng/ml, 2.77-4.25 CI). These data confirm thepresence of detectable BSG in our plasma samples and suggest thatcirculating BSG is an early stage phenomenon during PDAC development.

PDAC cell line co-culture with PCAFs: The human pancreatic cancer celllines AsPC-1, MIA PaCa-2, and PANC-1, stably transfected to express redfluorescent protein (RFP), have been created and maintained in ourlaboratory and are available for this study. Briefly, the effect ofco-culture with pancreas cancer-associated fibroblasts (PCAF2) onRFP-MIA PaCa-2 growth was monitored by RFP fluorescence. Numbers belowthe data bars represent initial plating densities for each cell type.RFP fluorescence intensity was measured 3 hours after initial plating toallow for adherence (Day 0) and then daily for three days. Relativegrowth is presented as the Day 3/Day 0 RFP fluorescence ratio relativeto the Day 3/Day 0 RFP fluorescence ratio of RFP-MIA PaCa-2 cells alone(open bar). Relative growth was monitored for a constant number ifinitially plated RFP-MIA PaCa-2 cells co-cultured with an increasingnumber of either PCAF2 (FIG. 2A). The effect of co-culture with PCAF2 onRFP-MIA PaCa-2 cell growth was also monitored while keeping the initialtotal cell plating density constant (FIG. 2B). Data represents thecombined results of 3-4 independent experiments. For each experiment,4-6 replicate wells for each condition were performed in parallel.Bracket indicates conditions significantly elevated relative to RFP-MIAPaCa-2 cells alone by ANOVA and Fisher's PLSD post-hoc test (P<0.01).Currently, 40 human primary pancreatic cancer-associated fibroblastshave been isolated and characterized. To date, all of the isolated PCAFsdemonstrate the ability to increase PDAC cell line proliferation 2-3fold in co-culture (FIG. 2). These data illustrate the system that willbe used in Example 4 to assess the effect of fibroblasts on BSG, MMP-2,and MMP-9 in co-culture supernatants.

Example 2. Plasma BSG, MMP-2, and MMP-9 Levels as Circulating PDACDiagnostic and Prognostic Biomarkers

Differential levels of BSG in healthy control, chronic pancreatitis,early stage PDAC and late stage PDAC cases will be confirmed. MMP-2 andMMP-9, whose expression is controlled by BSG will be evaluated as PDACdiagnostic and prognostic biomarkers. All three analytes will becompared to CA 19-9 levels, the current gold standard PDAC biomarker.

Experimental Approach: Plasma protein levels will be determined by ELISA(DEMP00 Human EMMPRIN/CD147 Quantikine ELISA Kit, MMP200 Total MMP-2Quantikine ELISA Kit, DMP900 Human MMP9 Quantikine ELISA Kit; R&DSystems, Minneapolis, Minn.). CA 19-9 levels will also be measured foreach sample (#6909 Gastrointestinal Cancer Antigen CA 19-9 Enzymeimmunoassay; Diagnostic Automation; Calabasas, Calif.). The plasmasamples will be diluted and analytes measured following themanufacturer's recommended protocol. Protein/antigen levels will becalculated by comparing absorbance readings against a calibration curvegenerated from standards of known concentration. For those samples whoseinitial measurements do not initially fall within the linear detectionrange, sample dilutions will be modified accordingly and themeasurements repeated.

Levels of BSG, MMP-2, MMP-9, and CA 19-9 will be determined in plasmasamples from 100 patients with histologically/cytologically confirmedPDAC, 100 healthy control subjects (gender-matched and age-approximatedto PDAC cases), and 80 patients with chronic pancreatitis. PDAC sampleswill be selected to have equal representation of early stage and latestage cases. Sufficient samples for this study have been previouslycollected and banked in our institution. Sequential cases will be chosenfor each group, where possible. Plasma samples from normal controlsubjects were obtained from healthy adults accompanying index patients.All blood samples were collected prior to treatment (except for stentplacement in jaundiced PDAC cases), separated into the plasma component,and frozen for later analysis. For PDAC cases, bilirubin levels at thetime of sample collection will be abstracted from patient charts.

Data Analysis and Interpretation: Linear models will be used to relateBSG, MMP-2, MMP-9, and CA 19-9 levels to gender, age, and stage.Additional comparisons for protein/antigen levels in PDAC cases will bemade for bilirubin and treatment (patients receiving surgical orchemoradiation treatments expected to affect survival). Univariate Coxmodels will be employed for survival analyses with multivariate modelsdeveloped using significant univariate parameters. Correlation analyseswill be performed to compare the relationships between BSG, MMP-2, andMMP9 levels. Receiver operating characteristic curves will be determinedand the area under the curve calculated as a comparative measure ofdiagnostic accuracy. Sensitivity and specificity will be determined atoptimal (maximizing accuracy) and assigned threshold levels (yielding95% specificity). Statistical analyses will be performed using “R”statistical computing software, version 3.0.2 or later (38). P values<0.05 will be considered significant.

Although the proposed experiments are designed to characterize BSG,MMP-2, and MMP-9 as potential diagnostic and prognostic biomarkers, themajor objective of the aim is to validate the preliminary findings thatBSG is elevated in early stage PDAC compared to the other groups. Twospecific comparisons are of particular interest:

BSG levels in stage I-II PDAC vs. stage III-IV PDAC We will test thehypothesis that BSG levels will be elevated in plasma samples from earlystage PDAC cases relative to samples from late stage cases. By design,the sample set can be split for analysis either as test and validationsets or as a combined set (with cross-validation). Based on the pilotdata (σ=2.2, δ=1), a sample size of 50 has 88% power to detect adifference at α=0.05 using the student's t-test. The combined data set(N=100) will have greater than 99% power to detect a similar effectsize.

BSG levels in stage I-II PDAC vs. ChPT—We will also test the hypothesisthat BSG levels will be elevated in plasma samples from early stage PDACcases relative to samples from chronic pancreatitis cases. Based on thepilot data (σ=2.4, δ=0.8), a sample size of 50 (recapitulating the pilotexperiment) has 65% power to detect the difference at α=0.05, althoughthere was greater power in the multi-group analysis. The combined dataset (N=130) will have greater than 97% power to detect a similar effectsize.

The preliminary data is compelling for the differential BSG levelsbetween early and late stage cases lending high confidence that thedifference will be recapitulated. Such a result will justify continuedinvestigation for BSG as a diagnostic marker, including evaluation in alarger sample set from different institutions, examination of BSG levelsin other pancreatic diseases (ampullary adenocarcinoma, neuroendocrinetumors, benign and pre-neoplastic cystic lesions), and testing samplesfrom high-risk subjects and pre-diagnostic samples from patients whosubsequently developed PDAC. The preliminary evidence is less compellingfor the comparison of early stage PDAC and ChPT, due to the highervariance in the ChPT group. Chronic pancreatitis is a persistentdiagnostic problem for patients presenting with periampullary diseaseand biomarkers that can distinguish PDAC from the benign inflammatorycondition are desirable. However, the lack of discrimination byaggregate BSG levels would not preclude its use in diagnostic panelssince other biomarkers could compensate. The potential for BSG levels toindicate early stage disease is the principal benefit.

Example 3. Assessment of Tumor BSG Levels in Resected Cases byImmunohistochemistry and Determination of the Relationship of BSGExpression to Extent of ECM Deposition in Cancer-Associated Stroma

The experiments are expected to enlighten the apparent biphasicdistribution of plasma BSG levels observed in preliminary data with 12early stage PDAC patients exhibiting high levels, while 17 showed levelscomparable to controls. One possibility is that plasma BSG levelscorrespond to the extent of desmoplasia in the primary tumor. Addressingthis possibility will provide mechanistic insight and suggest furtherinvestigations, particularly in regards to stratification of patientsfor MMP or BSG inhibitor treatment. The experiment will also evaluatetissue BSG expression as a prognostic indicator.

Experimental Approach: Paraffin tissue blocks will be selected fromclinical pathology specimens that have both cancer and associated stromarepresented for a minimum of 10 cases from the high plasma BSG group and10 cases from the low plasma BSG group. Sequential sections from eachcase will be stained with hematoxylin and eosin (H&E), Masson'stri-chrome, and BSG via immunofluorescence. Antibodies for BSG stainingwill be optimized for dilution and antibody specificity evaluated bycomparing sections stained using secondary antibody only and sectionsstained in the presence of excess antigen. Slides will be evaluated byinvestigators blinded to the sample identity using a four point scalerepresenting quartiles (0-100% or lowest-highest). H&E slides will bescored for inflammatory cell infiltration and used to evaluate tumorstructure. BSG slides will be evaluated for immunofluorescent intensityand the percent of staining (coverage) in both cancer cells andassociated stroma. Scanned images of tri-chrome stained sections will beused to determine the extent of collagen deposition relative to thenumber of cancer cells.

Data Analysis and Interpretation: We will test the hypotheses that BSGexpression in tumors is directly (inversely) related to the extent ofdesmoplasia by comparing the distribution of standardized collagendeposition to BSG intensity and coverage scores. Models will bedeveloped that evaluate the relationship of desmoplasia and BSGexpression to inflammatory cell infiltration as well as BSG releasedinto circulation. It is unclear if, at the time of resection,desmoplasia will have been depleted by the activity of remodelingfactors such as BSG and MMPs or if extensive desmoplasia results inhigher BSG expression. Establishing causality will require techniquessuch as those proposed in Example 4. Prognostic models for BSG tumorexpression will also be developed similarly to those described inExample 2.

Example 4. Determination of BSG, MMP-2, and MMP-9 in Cell CultureSupernatants in Response to PCAFs in Co-Culture with PDAC Cell Lines

This example describes development of an in vitro system that will allowinvestigations into the mechanisms of BSG activity in response to thepresence of stromal cells. Tissue culture allows modulation of specificfactors. Gain-of-function experiments using expression plasmids andloss-of-function experiments using RNAi/stable hairpin expression orblocking antibodies are effective tools for dissecting mechanisms. Theresults of these experiments will be hypothesis generating and thesystem will allow subsequent investigation into temporal expression andspatial targeting information as well as mechanistic probing.

PCAF cells will be seeded in 96-well plates at a density of 5,000cells/well as outlined in preliminary data. After 24 hours, RFP-taggedPDAC cells will be added (5,000 cells/well) and incubated at 37° C. foran additional 3 hours. RFP measurements will then be made (Day 0reading). After 3 days in culture, the Day 3 RFP measurement will betaken and supernatants collected and cleared of cells and debris bycentrifugation. Supernatants from four replicate wells will be combinedand BSG, MMP-2, and MMP-9 levels will be determined by ELISA asdescribed in Aim 1. Initially, RFP-AsPC-1 cell lines will be used withfour PCAF lines, with additional PDAC cell lines and PCAFs used asdirected by experimental results. Each experiment will be independentlyrepeated four times.

Raw data will consist of BSG, MMP-2, and MMP-9 measurements from fourindependent experiments. Since PDAC cells grow more rapidly inco-culture, the contribution of BSG, MMP-2, and MMP-9 from these cellsshould be higher than PDAC cells grown alone. Therefore, ELISAmeasurements will be standardized to mean RFP readings for each group,which will also minimize plate effects. Since PCAFs grow much moreslowly than PDAC cells, their differential contribution should beminimal. We will test the hypothesis that levels of one or more of theproteins will increase in co-culture supernatants relative to (additive)levels in supernatants from culture of the individual cells. Given itsregulatory role, it is expected that if BSG levels increase, then MMP-2and MMP-9 levels will also increase. If BSG levels remain unchanged insupernatants, expression studies (PCR, immunohistochemistry) will beperformed. Possible explanations for this outcome would include the lackof effectors required for BSG release or that system (PCAFs and PDACcells) has been pre-programmed for specific BSG expression in the tumor.Decreased BSG levels, although not expected, can indicate the presenceor induction of inhibitors.

Example 5. Development of an Accurate Diagnostic Biomarker Panel for LowPrevalence Cancers

This Example demonstrates that a highly accurate blood-based diagnosticpanel can be developed from a reasonable number of individual serumbiomarkers that are relatively weak classifiers when used singly. Apanel constructed as described in this Example is advantageous in that ahigh level of specificity can be forced, accomplishing a prerequisitefor screening asymptomatic populations for low-prevalence cancers.

Existing biomarkers, biomarker panels, and diagnostic algorithms fallwell short of the accuracy levels required to bring the number offalse-positive determinations in asymptomatic populations into anacceptable range. Brand et al., Clin Cancer Res 2011, 17:805-816; Firpoet al., World J Surg 2009, 33:716-722; Lee and Saif, J O P 2009,10:104-108; Wingren et al., Cancer Res 2012, 72:2481-2490; Winter etal., J Surg Oncol 2013, 107(1):15-22. Since PDAC develops from multipledifferent combinations of genetic and possibly epigenetic lesions (Joneset al., Science 2008, 321:1801-1806; Ryan et al., Science 2012,336:1513-1514), it seems logical that individual cancer cases mayexpress a subset of markers while other cases express a differentsubset. Thus, attempts to identify a single test for discrimination ofall PDAC cases may be frustrated because of disease heterogeneity. Wedeveloped mathematical models based on experimental data from nine serumbiomarkers, all of which were significantly elevated in pancreaticcancer cases relative to controls. We asked if an accurate panelclassifying tool could be developed from a group of these weakindividual biomarkers and hypothesized that increased accuracy could berealized by allowing for multiple combinations of biomarkers,accommodating disease heterogeneity.

All studies were carried out with the approval of the University of UtahInstitutional Review Board and written informed consent was obtained foreach participant enrolled in the study protocols.

Serum levels of AXL, CA 19-9, haptoglobin, hyaluronic acid, MMP-7,MMP-11, osteopontin, serum amyloid A, and TIMP-1 were measured in serafrom 117 healthy control subjects and 58 chronic pancreatitis patients,and 159 PA patients collected prior to treatment. Control serum sampleswere obtained from both healthy adults accompanying index patients toclinic visits. Diagnoses of PDAC cases were confirmed by histologicalevaluation and consisted of a range of stages (10 stage IA or IB, 20stage IIA, 47 stage JIB, 30 stage III, and 52 stage IV). Diagnostic andprognostic characteristics for CA 19-9 (Poruk et al., Curr Mol Med.2013, 13(3):340-5), haptoglobin (Firpo et al., World J Surg 2009,33:716-722), osteopontin (Poruk et al., Pancreas. 2013, 42(2):193-7),serum amyloid A (Firpo et al., World J Surg 2009, 33:716-722), andTIMP-1 (Poruk et al., Pancreas. 2013, 42(2):193-7)) in our cohort havebeen previously published, as have prognostic characteristics for MMP-7(Fukuda et al., Cancer Cell 2011, 19:441-455). Biomarkercharacterization for AXL, hyaluronic acid, and MMP-11 will be publishedelsewhere. The median number of biomarkers queried per sample was 6 andmissing data points were imputed.

We modeled the behavior of a biomarker panel consisting of a sum ofindicator variables, then chose a cutoff for the sum to forcespecificity to be high, and calculated the resulting sensitivity. Togenerate correlated biomarkers, we simulated independent normal randomvariables for each biomarker, and then added a common simulated randomvariable to each of them to introduce correlation. By varying thestandard deviation of the common component, the correlation between thesimulated biomarkers could be adjusted. We then made a 95th percentilecutoff for each simulated biomarker and assessed the performance asabove. R statistical computing software version 2.8.0 (The R Foundationfor Statistical Computing, Vienna Austria) was used for the simulations.

Characteristics of Individual PDAC Biomarkers

To address the possibility of devising a test with 99% sensitivity andspecificity, we sought to develop mathematical models based onexperimental serum biomarker data. From previous experiments in which wedetermined levels of various biomarkers in serum from PDAC patients,chronic pancreatitis patients, and healthy subjects, we identified ninebiomarkers whose mean levels were significantly elevated in PDAC casesrelative to controls. These biomarkers were soluble AXL, CA 19-9,haptoglobin, soluble hyaluronic acid, matrix metallopeptidase 7 (MMP-7),matrix metallopeptidase 11 (MMP-11), osteopontin, serum amyloid A, andTIMP metallopeptidase inhibitor 1 (TIMP-1). Although the mean values ofeach of these biomarkers were significantly elevated in PDAC cases,accurate classification of individual results is problematic because ofthe large overlap of values within case and control groups. Theindividual biomarkers are thus weak diagnostic classifiers. Theoverlapping distributions for CA 19-9, haptoglobin, osteopontin, andTIMP-1 are shown in FIG. 5. This observed overlap is consistent withdisease heterogeneity in that individual cancer cases may develop toexpress a subset of markers while other cases express a differentsubset.

For the nine biomarkers, a sample set from 117 healthy control subjects,58 chronic pancreatitis patients, and 159 PA patients was identified forwhich at least 3 of the 9 biomarkers were measured in individualsamples. The median number of biomarkers queried per sample was 6 andmissing data points were imputed. This final data set was used toidentify biomarker characteristics for model development. To prioritizehigh specificity, we first assigned a diagnostic threshold (theindicator variable) at the 95th percentile of control values on theindividual biomarkers and then calculated the resulting sensitivity.Between 17% and 75% of the PDAC cases had values above the 95%specificity threshold with an average sensitivity for all biomarkers of32%.

Since direct correlation between biomarkers provides less diagnosticinformation than independent predictors, we also assessed the degree ofcorrelation between the nine biomarkers within each group (PDAC, healthycontrols, chronic pancreatitis). The correlation between the indicatorvariables was near zero in controls and slightly positive in PDAC cases.None of the biomarkers were highly correlated. The correlation in thePDAC samples had mean of 0.15 and median 0.13, but was highly variable(range −0.12-+0.44). The mean and median correlation in the controls was0.12 and 0.088, respectively. Since the square of the correlation is thepercentage of shared variation, markers shared about 2% of the variationin cases and 1-2% of variation in controls. This could be anoverestimate, as missing data was imputed.

Modeling PDAC Biomarker Panels

We modeled the behavior of a biomarker panel consisting of a sum ofindicator variables, then chose a cutoff for the sum to forcespecificity to be high, and calculated the resulting sensitivity. Togenerate correlated biomarkers, we simulated correlated continuousbiomarker data, made a 95th percentile cutoff for each biomarker andthen assessed performance as above. The average correlation assumptionwas conservative in that we ignored inverse correlation in our modeling,which would tend to increase overall accuracy of the panel. Therefore,we also modeled the less conservative correlation assumption of 0.05.

Modeling results for three panels that required 99% panel specificity,but were derived using different sensitivity assumptions about theindividual biomarkers are shown in FIG. 6. The model demonstrated, forexample, that a panel consisting of 40 biomarkers characterizedindividually by 32% sensitivity at 95% specificity would require any 7biomarkers to be above the threshold and would result in a panelsensitivity of at least 99% (FIG. 6 B). The addition of correlationassumptions reduced sensitivity for the 40-biomarker panel to 94% at anaverage correlation of 0.05 and 84% at an average correlation of 0.15.Increasing the mean sensitivity of the individual biomarkers from 19% to42% in the panel not only reduced the number of biomarkers required forhigh accuracy, but also reduced the contribution of correlation betweenthe individual biomarkers.

The idea was conceived to generate a “strong classifier” panel from agroup of “weak classifiers”, with the stipulation that the algorithmallow for heterogeneity of the disease. The goal of accommodatingdisease heterogeneity by allowing different biomarker subsets wouldincrease the overall number of biomarkers necessary in the panel. Modelsdeveloped using the characteristics of nine biomarkers measured in humansamples revealed that panels with 99% specificity and sensitivity couldbe achieved using a reasonable number of biomarkers. The approach isadvantageous in that a high level of specificity can be forced anddemonstrates that accommodating heterogeneity in the system has thepotential to improve accuracy of cancer diagnostic biomarker panels,particularly for low-prevalence cancers.

Although the main goal was to evaluate if increased accuracy could berealized by allowing for disease heterogeneity, one limitation of theexperimental design is that the dataset used biomarker levels from allPDAC stages. To be effective at improving outcomes, any diagnosticscreening test should be able to identify early stage, treatable cases.Whether or not these biomarkers exist for PDAC will require furtherconfirmation. The likelihood of finding 30-50 biomarkers with at leastthe average levels of accuracy seen in the nine biomarkers used hereseems reasonable given that 162 secreted proteins are routinely overexpressed in PDAC tumors (Harsha et al., PLoS Med 2009, 6:e1000046) andother biomarkers, such as degraded cell-surface proteins, miRNAs,genetic mutations, and metabolic products could be incorporated toextend the panel. Since highly correlated biomarkers provide the sameinformation, the most suitable biomarkers for inclusion in a panel willlikely be those that identify different features of the disease.Finally, although increasing the accuracy of tests for low prevalencecancers would reduce the cost and distress associated with falselypositive determinations in screening of asymptomatic populations, anacceptable level for false-positive determinations is an open questionthat need be addressed by clinical discourse.

This Example demonstrates that mathematical modeling of existing serumbiomarker data, by allowing for diverse responses between cases, allowsfor a biomarker panel to be devised that has greater than 99% accuracyfor diagnosis of a low prevalence cancer, pancreatic ductaladenocarcinoma. The results suggest that limiting analysis to thosebiomarkers with only the highest accuracy may be counterproductive andprovide a framework for identifying useful biomarker characteristics andminimizing biomarker correlation.

Example 6. Development of an Accurate, Blood-Based Assay to ScreenAsymptomatic Patients for Pancreatic Ductal Adenocarcinoma (PDAC) Risk

With the intent of developing an accurate, blood-based assay that couldbe used to screen asymptomatic patients for pancreatic ductaladenocarcinoma (PDAC) risk, levels of 30 analytes (Table 4) weremeasured in each of 180 serum samples (Table 6) from healthy controlsubjects, chronic pancreatitis cases, and an unique cohort of earlystage pancreatic cancer cases. Multiple analytical methodologies wereapplied to the resulting data set and diagnostic algorithms wereidentified with accuracies approaching 100%. With the most rigorouspredictive cross-validated analyses available, greater than 83% accuracyis expected for classification of additional samples. The analyticframework allows the addition of other molecular signals. It is expectedthat increased accuracy can be realized by incorporating additionalanalytes into the resulting algorithms.

Individual Analyte Performance for Discrimination betwccn SubjectClasses. The distributions of serum analyte levels were compared byKruskal-Wallis one-way analysis of variance by ranks (FIG. 9). Nineteenanalytes (ALCAM, AXL, CA 19-9, CEACAM1, COL18A1, EPCAM, HA, HP, ICAM1,IGFBP2, IGFBP4, MMP2, MMP7, SPP1, PRG4, TGFBI, THBS1, TIMP1, TNFRSF1A)had significant differences between healthy controls (CON), chronicpancreatitis (ChPT), and early stage pancreatic cancer (PDAC) classes,while an additional four analytes (BAG3, BSG, LCN2, PARK7) trendedtowards significance.

Since age was a significant predictor of diagnosis (P=0.0006 byKruskal-Wallis), and since gender was unequally distributed within theclasses (Table 6), subsequent analyses were performed using adjusteddata. Analyte data was log 10 transformed and adjusted for age andgender using the control data. A natural spline with 2 degrees offreedom was used to adjust for age. Receiver operating characteristic(ROC) curves were generated for each analyte and areas under the ROCcurve calculated (Table 5). ALCAM, AXL, CA 19-9, CEACAM1, COL18A1, HA,HP, ICAM1, IGFBP2, MMP2, PRG4, SPP1, TGFBI, TIMP1, and TNFRSF1A (15analytes) showed capacity for discriminating healthy controls from earlystage PDAC cases, ALCAM, AXL, BSG, CA 19-9, CEACAM1, EPCAM, HA, ICAM1,PARK7, and TGFBI (10 analytes) showed capacity for discriminatingchronic pancreatitis cases from early stage PDAC cases, ALCAM, BAG3, CA19-9, CEACAM1, COL18A1, EPCAM, HA, HP, ICAM1, IGFBP2, IGFBP4, LCN2,MMP2, PARK7, SPP1, THBS1, TIMP1, and TNFRSF1A (18 analytes) showedcapacity for discriminating healthy controls from chronic pancreatitiscases, and ALCAM, AXL, CA 19-9, CEACAM1, COL18A1, HA, ICAM1, IGFBP2,PARK7, SPP1, TGFBI, and TIMP1 (12 analytes) showed capacity fordiscriminating both healthy controls and chronic pancreatitis cases fromearly stage PDAC cases.

The best performing analytes, yielding areas under the ROC curve greaterthan 0.70, were ALCAM, AXL, CA 19-9, CEACAM1, HA, ICAM1, IGFBP2, andSPP1 (8 analytes) for discriminating healthy controls from early stagePDAC cases, CA 19-9 and CEACAM1 (2 analytes) for discriminating chronicpancreatitis cases from early stage PDAC cases, EPCAM, ICAM1, IGFBP2,and SPP1 (4 analytes) for discriminating healthy controls from chronicpancreatitis cases, and ALCAM, CA 19-9, CEACAM1, and ICAM1 (4 analytes)for discriminating both healthy controls and chronic pancreatitis casesfrom early stage PDAC cases.

Performance of Analyte Panels for Discrimination between SubjectClasses. A threshold voting scheme, illustrated in FIG. 10 and describedin Firpo et al., 2014, Theor Biol Med Model, 11(1):34, was applied tothe comprehensive ELISA data set. Using age/gender adjusted, log 10data, the 11 analyte panel consisting of ALCAM, AXL, CA 19-9, CEACAM1,HA, ICAM1, MMP2, PARK7, SPP1, TGFBI, and TIMP1, with any 3 analytesabove the individual threshold (vote) yielded a specificity of 97% and asensitivity of 72% for discriminating between healthy control subjectsand early stage PDAC cases (FIG. 10, panel A). A 7 analyte panelconsisting of ALCAM, CA 19-9, CEACAM1, HA, ICAM1, PARK7, and TGFBI withany 3 analytes above the individual threshold yielded a specificity of98% and a sensitivity of 62% for discriminating both healthy controlsand chronic pancreatitis cases from early stage PDAC cases (FIG. 10,panel B). In order to reduce the impact of selection bias andoverfitting, the threshold voting scheme was applied to the CON and PDACdata using a cross-validation approach. Here, the sample sets weredivided into 10 non-overlapping groups of roughly equal size, 9 of whichwere used as a training set and the remaining group as a test set togenerate less biased estimates of error rates. This procedure wasrepeated 10 times to stabilize estimates. The selection of the analytesas well as the cutoff thresholds was part of the cross-validation (as isappropriate). Requiring specificity of 95% or higher and sensitivity of25% or higher for each selected analyte, a 10 analyte panel consistingof ALCAM, AXL, CA 19-9, CEACAM1, HA, ICAM1, MMP2, SPP1, TGFBI, and TIMP1yielded uncorrected sensitivity and specificity of 70% and 98.3%respectively. The cross-validated sensitivity and specificity of 69.8%and 95.3% respectively illustrate the expected accuracy of the10-analyte panel when applied to new samples for diagnostic purposes.

Performance of Predictive Algorithms for Discrimination between SubjectClasses. The comprehensive data set was analyzed using various ensembleprediction methods (Table 4) for each of four class comparisons (CON vs.PDAC, ChPT vs. PDAC, CON vs. ChPT, and CON+ChPT vs. PDAC). To minimizebias in method selection, the ensemble algorithms themselves were alsosubjected to cross-validation using the SuperLearner prediction method,which seeks to find the optimal combination of a collection ofprediction algorithms by minimizing the cross-validated risk. For eachclass comparison, the Random Forest method identified an algorithm thatcould categorize each sample to the correct class with 100% accuracy.For each class comparison, the SuperLearner identified an algorithm withthe lowest mean squared error and cross-validated probability of correctclassification of between 72% and 83.3%.

Brief Description of Algorithms.

Random forest ensemble learning method for classification and regressionthat operates by constructing a multitude of decision trees (training)and outputting the class that is either the mode of the classes(classification) or mean prediction (regression) of the individualtrees. The random forest technique is less prone to overfitting thetraining set than the decision tree technique. Recursivepartitioning—decision tree analysis that strives to correctly classifymembers of a population based on dichotomous independent variables. Theresulting classification scheme is generally intuitively obvious, doesnot require a mathematical formula, and can be tuned to emphasizespecificity or sensitivity, but is prone to overfitting. Generalizedlinear model—a generalization of ordinary linear regression that allowsfor response variables that have arbitrary distributions (rather thannormal distributions), and for an arbitrary function of the responsevariable (the link function) to vary linearly with the predicted values(rather than assuming that the response itself must vary linearly). Knearest neighbors—a non-parametric method used for classification andregression with the input consists of the k closest training examples inthe feature space. An object is classified by a majority vote of itsneighbors, with the object being assigned to the class most common amongits k nearest neighbors, or a value assigned that is the average of thevalues of its k nearest neighbors. Neural network-teaming algorithm usedto estimate or approximate functions that can depend on a large numberof inputs and are generally unknown, consisting of sets of adaptiveweights (i.e. numerical parameters that are tuned by a learningalgorithm) and are capable of approximating non-linear functions oftheir inputs.

Support vector machine—supervised learning models with associatedlearning algorithms used for classification and regression analysis.Given a set of training examples, the training algorithm builds a modelthat assigns new examples into one category or the other, making it anon-probabilistic binary linear classifier. An SVM model is arepresentation of the examples as points in space, mapped so that theexamples of the separate categories are divided by a clear gap that isas wide as possible. New examples are then mapped into that same spaceand predicted to belong to a category based on which side of the gapthey fall on. SVMs can also perform a non-linear classification mappingtheir inputs into high-dimensional feature spaces.

SuperLearner—a prediction method designed to find the optimalcombination of a collection of prediction algorithms by assessing thecombination of algorithms minimizing the cross-validated risk. TheSuperLearner was originally described in van der Laan et al., (2007)Stat Appl Genet Mol Biol, 6:Article25 PMID: 17910531. Each of theaforementioned algorithms are known to those skilled in the art.

In general, the predictive algorithms, including the SuperLearner,require the full analyte data set as input in the context of screeningnew samples. The specific analytes contributing to prediction are notreadily accessible. However, for each class comparison, specificanalytes were identified by the random forest prescreen applied to thegeneralized linear model. They were ALCAM, CA 19-9, CEACAM1, HA, ICAM1,IGFBP2, LCN2, SPP1, TGFBI, and THBS1 for discriminating healthy controlsfrom early stage PDAC cases, ALCAM, CA 19-9, CEACAM1, EPCAM, ICAM1,MSLN, PARK7, SPARCL1, TGFBI, and TIMP1 for discriminating chronicpancreatitis cases from early stage PDAC cases, CEACAM1, EPCAM, HA, HP,ICAM1, IGFBP2, LCN2, MMP2, SPP1, and THBS1 for discriminating healthycontrols from chronic pancreatitis cases, and ALCAM, AXL, CA 19-9,CEACAM1, COL18A1, HA, ICAM1, LCN2, MMP2, TGFB1 discriminating bothhealthy controls and chronic pancreatitis cases from early stage PDACcases.

The resulting algorithms can be applied to new samples for diagnosticscreening. The expected accuracy of such tests is listed in Table 7(Cross validated probability of correct classification). Furthermore,the algorithms can be modified to accommodate adaptive learning, asconfirmatory diagnoses become available for new samples. Increasedaccuracy of the algorithms can be achieved by incorporating additionalanalytes (measured in each of the input samples), such as miRNA signals,tumor DNA signals, autoantibody signals, as well as additional peptideand polysaccharide signals.

TABLE 7 Algorithms applied to new samples for diagnostic screeningUncor- Cross rected Cross- validated probability validated probabilityFractional of correct Risk, mean of correct contribution classifica-squared classifica- in Algorithm tion error (SE) tion SuperLearnerEvaluation of algorithms for CON vs. PDAC glmnet 0.875 0.137 (0.017)0.825 0 randomForest 1.00 0.121 (0.016) 0.858 0.415 rpart 0.883 0.181(0.029) 0.783 0 glm + pre- 0.858 0.157 (0.024) 0.767 0.066 screen knn0.850 0.137 (0.019) 0.833 0.491 nnet 0.917 0.147 (0.029) 0.842 0.028 svm0.958 0.142 (0.019) 0.808 0 mean 0.500 0.254 (0.001) 0.400 0SuperLearner 0.115 (0.017) 0.833 — CA 19-9 0.833 0.824 Evaluation ofalgorithms for ChPT vs. PDAC glmnet 0.767 0.201 (0.017) 0.692 0randomForest 1.00 0.162 (0.015) 0.775 0.039 rpart 0.883 0.200 (0.029)0.742 0.185 glm + pre- 0.775 0.193 (0.024) 0.742 0.016 screen knn 0.8000.169 (0.017) 0.725 0.613 nnet 0.900 0.223 (0.034) 0.733 0 svm 0.9830.178 (0.019) 0.750 0.147 mean 0.500 0.253 (0.001) 0.417 0 SuperLearner0.171 (0.017) 0.742 — CA 19-9 0.708 0.713 Evaluation of algorithms forCON vs. CHPT glmnet 0.808 0.198 (0.019) 0.708 0 randomForest 1.00 0.175(0.015) 0.708 0.559 rpart 0.842 0.193 (0.026) 0.700 0 glm + pre- 0.7500.214 (0.027) 0.692 0.262 screen knn 0.733 0.207 (0.019) 0.675 0.173nnet 0.933 0.275 (0.033) 0.633 0.006 svm 0.967 0.197 (0.019) 0.733 0mean 0.500 0.256 (0.002) 0.383 0 SuperLearner 0.193 (0.019) 0.717 — CA19-9 0.600 0.590 Evaluation of algorithms for CON + ChPT vs. PDAC glmnet0.828 0.147 (0.018) 0.794 0.133 randomForest 1.00 0.128 (0.015) 0.867 0rpart 0.883 0.141 (0.021) 0.833 0.171 glm + pre- 0.861 0.147 (0.018)0.822 0 screen knn 0.872 0.130 (0.017) 0.828 0.373 nnet 0.917 0.210(0.016) 0.778 0.177 svm 0.922 0.134 (0.016) 0.828 0.145 mean 0.667 0.224(0.011) 0.667 0 SuperLearner 0.127 (0.016) 0.833 — CA 19-9 0.767 0.762a) glmnet—a version of Lasso regression; b) randomForest—random forest;c) rpart—recursive partitioning; d) glm + prescreen—the generalizedlinear model. The random forest prescreen was used to avoid overfittingwhich caused non-convergence for some of the cross-validation model; e)knn—k nearest neighbors; f) nnet—neural network; g) svm—support vectormachine; h) mean—the predicted probability is the overall mean for everycase (=0.5 for the full data); i) SuperLearner—the NNLS algorithm wasused to fit the SuperLearner, with the binomial family. 10-fold crossvalidation was used both within the SuperLearner to evaluate eachalgorithm and to evaluate the SuperLearner.

Example 7. Performance of the Diagnostic SuperLearner Algorithm Comparedto CA 19-9 Alone

This example demonstrates that the SuperLearner has improved performanceover CA 19-9 alone when applied to additional samples.

To evaluate the SuperLearner over all threshold cutoffs, receiveroperating characteristic analyses were performed. The SuperLearnerprovides predicted probabilities for each observation and thesepredicted probabilities can be used to generate an ROC curve in the sameway that an ROC curve can be generated from any other continuousvariable (such as the values from a single analyte). For each possiblepredicted probability value there is a true positive fraction and afalse positive fraction, and one can plot them against each other. Togenerate unbiased estimates, the predicted probabilities were generatedby cross-validation in which the SuperLearner was trained on 90% of thedata from the 30 analytes listed in Table 1, and predictions are made onthe other 10%. This was repeated 10 times, with a different 10% of thedata in the validation set each time. Similarly, for the individualanalytes (Table 3 and FIG. 7), the ROC AUCs reported were generated bybootstrap resampling (2000×) and represent estimated error rates fromsamples in the test sets. The cross-validated and bootstrap AUCs thusreflect the expected accuracy in new sample sets. The CA 19-9 ROC curveand AUC is comparable to the cross-validated SuperLearner ROC curve andAUC (FIG. 7). The relative improvement of the cross-validatedSuperLearner over CA 19-9 alone was 35%, [(0.908−0.858)/(1−0.858)]×100.

Example 8. Feature Selection Maximizing Area Under the ReceiverOperating Characteristics Curve

The ensemble techniques illustrated in Table 7 function by minimizingthe mean squared error associated with the diagnostic test. We alsoevaluated the 30-analyte panel using a technique that maximizes areaunder the receiver operating characteristic curve and identifiesanalytes that most contribute to class discrimination using the lasso(least absolute shrinkage and selection operator) procedure. As devised,the lasso procedure seeks to reduce over-fitting of the data, which isevident in the overlap between the fitted data (FIG. 8, Apparent Curve)and the resampled data (FIG. 8, Bootstrap Curve). For the comparison ofhealthy controls from early stage PDAC (FIG. 8A), the optimal panelincluded analytes BAG3, CA 19-9, CEACAM1, HA, IGFBP2, PARK7, and SPP1.For the comparison of healthy controls+chronic pancreatitis from earlystage PDAC (FIG. 8B), the optimal panel included analytes BAG3, CA 19-9,CEACAM1, EPCAM, LCN2, MSLN, PARK7, PRG4, SPP1, and TNFRSFlA. These datasuggest that comparably accurate algorithms can be devised using subsetsof the 30 analyte panel.

Example 9. Identification of Novel Biomarkers for Early Stage PDAC

This example describes identification of novel biomarkers for diagnosisof early stage PDAC.

The analytes listed in Tables 4 and 5 feature 8 novel analytes (ALCAM,BAG3, BSG, HA, PARK7, PRG4, SPARCL1, and TGFBI) that have not beenpreviously systematically evaluated for their ability to contribute toearly stage PDAC diagnosis. All 8 analytes likely contribute at leastorthogonal information to the ensemble algorithms, including theSuperLearners. Additionally, 7 of these 8, ALCAM, BAG3, HA, PARK7, PRG4,SPARCL1, and TGFBI, were specifically identified as contributing todiagnostic algorithms, either in the threshold-voting scheme, the RandomForest prescreen to the generalized linear model, or by featureselection via maximizing the area under the receiver operatingcharacteristic curve. Several of these analytes were identified inmultiple algorithms. The eighth novel analyte, BSG, was shown to bespecifically elevated in plasma samples from early stage PDAC cases(FIG. 1), providing support for the contribution of BSG to orthogonalinformation in the ensemble algorithms.

From the foregoing description, it will be apparent that variations andmodifications can be made to the invention described herein to adopt itto various usages and conditions. Such embodiments are also within thescope of the following claims.

The recitation of a listing of elements in any definition of a variableherein includes definitions of that variable as any single element orcombination (or subcombination) of listed elements. The recitation of anembodiment herein includes that embodiment as any single embodiment orin combination with any other embodiments or portions thereof.

All patents and publications mentioned in this specification are hereinincorporated by reference to the same extent as if each independentpatent and publication was specifically and individually indicated to beincorporated by reference.

What is claimed is:
 1. A panel of isolated biomarkers comprising N ofthe biomarkers listed in Tables 1 through
 4. 2. The panel of claim 1,wherein N is a number selected from the group consisting of 2 to
 30. 3.The panel of claim 2, wherein said panel comprises at least two isolatedbiomarkers selected from the group consisting of the biomarkers setforth in Tables 1 through
 4. 4. The panel of claim 2, wherein said panelcomprises basigin (BSG).
 5. The panel of claim 2, wherein said panelcomprises soluble hyaluronic acid (HA).
 6. A method of determining aprobability for early stage pancreatic ductal adenocarcinoma (PDAC) inan individual, the method comprising detecting a measurable feature ofeach of N biomarkers selected from the biomarkers listed in Tables 1through 4 in a biological sample obtained from said individual, andanalyzing said measurable feature to determine the probability for earlystage PDAC in said individual.
 7. The method of claim 6, wherein saidmeasurable feature comprises fragments or derivatives of each of said Nbiomarkers selected from the biomarkers listed in Tables 1 through
 4. 8.The method of claim 6, wherein said detecting a measurable featurecomprises quantifying an amount of each of N biomarkers selected fromthe biomarkers listed in Tables 1 through 4, combinations or portionsand/or derivatives thereof in a biological sample obtained from saidindividual.
 9. The method of claim 8, further comprising calculating theprobability for early stage PDAC in said individual based on saidquantified amount of each of N biomarkers selected from the biomarkerslisted in Tables 1 through
 4. 10. The method of claim 6, furthercomprising an initial step of providing a biomarker panel comprising Nof the biomarkers listed in Tables 1 through
 4. 11. The method of claim6, further comprising an initial step of providing a biological samplefrom the individual.
 12. The method of claim 6, further comprisingcommunicating said probability to a health care provider.
 13. The methodof claim 12, wherein said communication informs a subsequent treatmentdecision for said individual.
 14. The method of claim 6, wherein N is anumber selected from the group consisting of 2 to
 30. 15. The method ofclaim 14, wherein said N biomarkers comprise basigin (BSG).
 16. Themethod of claim 14, wherein said N biomarkers comprise solublehyaluronic acid (HA).
 17. The method of claim 6, wherein said analysiscomprises a use of a predictive model.
 18. The method of claim 17,wherein said analysis comprises comparing said measurable feature with areference feature.
 19. The method of claim 6, wherein the biologicalsample is selected from the group consisting of whole blood, plasma, andserum.
 20. The method of claim 19, wherein the biological sample isserum.
 21. The method of claim 8, wherein said quantifying comprises anassay that utilizes a capture agent.
 22. The method of claim 21, whereinsaid capture agent is selected from the group consisting of andantibody, antibody fragment, nucleic acid-based protein binding reagent,small molecule or variant thereof.
 23. The method of claim 22, whereinsaid assay is selected from the group consisting of enzyme immunoassay(EIA), enzyme-linked immunosorbent assay (ELISA), and radioimmunoassay(RIA).
 24. The method of claim 6, further comprising detecting ameasurable feature for one or more risk indicia.
 25. A method ofdetermining a probability for recurrence of PDAC in an individual, themethod comprising detecting a measurable feature of each of N biomarkersselected from the biomarkers listed in Tables 1 through 4 in abiological sample obtained from said individual, and analyzing saidmeasurable feature to determine the probability for for recurrence ofPDAC in said individual.
 26. A method of determining a probability of apositive response to therapy for pancreatic ductal adenocarcinoma (PDAC)in an individual, the method comprising detecting a measurable featureof each of N biomarkers selected from the biomarkers listed in Tables 1through 4 in a biological sample obtained from said individual, andanalyzing said measurable feature to determine the probability of apositive response to therapy for PDAC in said individual.
 27. The methodof claim 26, wherein said therapy is selected from the group consistingof prophylactic anticoagulants, resection, neoadjuvant chemotherapy andchemoradiation.